Simple partitioning of live data with Postgres

10.2020 | Category: How To | Tags: administration, partitioning

10.2020

Category: How To

written by László Forró (LinkedIn) and André Plöger (LinkedIn & Xing).

Table of Contents

Partitioning data is a well known way to optimize big tables into smaller ones using a given attribute. And while creating a partitioned table structure itself is not complicated, the migration of a table on a live system can be tricky. Our solution offers a simple way to solve this problem.

The usual way can be summarized the following:

create a table and necessary triggers for the future partitioned data
create an insert trigger on the existing live table that on insert/update/delete performs the same operation on the partitioned table, to maintain the data integrity
copy the data from the live table to the partitioned table - likely in batches and in a transaction
switch in the application code to the partitioned table or rename the tables
delete the original live table

Disadvantage of this method

The disadvantage of the method above is that it is relatively complex; the triggers that secure data integrity must be tested. It also creates data redundancy, because during migration the data must be duplicated. It also requires a migration plan, and a change in the application when switching from the old table to the new one - and that may or may not be an issue.

During data migration the particular difficulty is: how to be certain that if a given batch of data is being migrated and entries of the same batch of data is updated or deleted, the correct values are going to be present in the new, partitioned table. It may mean some freeze of insert/updated during migration (see answer 13) or some logic that re-scans the data in the batch after migration for any changes and gives priority to the original table. At think project! we needed both to be sure that we have no data loss - that may not be an issue in some scenarios -, and that the downtime/service time is minimal, or the better, non existent.

Another solution

Below, we would like to present a simple and robust solution that does not require duplication of the data, nor downtime or any freeze during migration and overall it is a minimum intrusion to the system. The solution is using the original table as the mother table for the partitioning. The data is moved from the mother table using a BEFORE UPDATE trigger, updating an extra column of each row by some migration process. This update value is added only for this purpose. In case a row is still in the mother table when queried - Postgres will deliver it from the mother table before consulting a partition.

Below we demonstrate how to set up a basic partitioning, and after that our proposal for the migration process is discussed.

Preparation for partitioning

Let's take the source table that acts as our live table. The part_value will be the base of our partitioning.

CREATE TABLE livedata (
  id SERIAL PRIMARY KEY,
  some_data TEXT,
  part_value INTEGER NOT NULL
);

INSERT INTO livedata(part_value,some_data) 
VALUES (1,'this'), (1,'this_is'),(2,'that');

CREATE TABLE livedata (

id SERIAL PRIMARY KEY,

some_data TEXT,

part_value INTEGER NOT NULL

);

INSERT INTO livedata(part_value,some_data)

VALUES (1,'this'), (1,'this_is'),(2,'that');

Let's create the child tables for partitioning:

CREATE TABLE livedata_part_1 (LIKE livedata);
ALTER TABLE livedata_part_1 INHERIT livedata;
ALTER TABLE livedata_part_1 ADD CONSTRAINT part_value 
    CHECK ( part_value = 1 );
CREATE TABLE livedata_part_2 (LIKE livedata);
ALTER TABLE livedata_part_2 INHERIT livedata;
ALTER TABLE livedata_part_2 ADD CONSTRAINT part_value 
    CHECK ( part_value = 2 );

CREATE TABLE livedata_part_1 (LIKE livedata);

ALTER TABLE livedata_part_1 INHERIT livedata;

ALTER TABLE livedata_part_1 ADD CONSTRAINT part_value

CHECK ( part_value = 1 );

CREATE TABLE livedata_part_2 (LIKE livedata);

ALTER TABLE livedata_part_2 INHERIT livedata;

ALTER TABLE livedata_part_2 ADD CONSTRAINT part_value

CHECK ( part_value = 2 );

Now, add the insert trigger to the livedata table to forward the new inserts to the proper child tables:

CREATE OR REPLACE FUNCTION livedata_insert_function() 
RETURNS TRIGGER AS $
  BEGIN
    IF NEW.part_value BETWEEN 1 AND 2 THEN
      EXECUTE 'INSERT INTO livedata_part_'||NEW.part_value
             ||' VALUES ($1.*)' USING NEW;
    ELSE
      RAISE EXCEPTION 'part_value is out of range';
    END IF;
    RETURN NULL;
  END;
$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION livedata_insert_function()

RETURNS TRIGGER AS $

BEGIN

IF NEW.part_value BETWEEN 1 AND 2 THEN

EXECUTE 'INSERT INTO livedata_part_'||NEW.part_value

||' VALUES ($1.*)' USING NEW;

ELSE

RAISE EXCEPTION 'part_value is out of range';

END IF;

RETURN NULL;

END;

$ LANGUAGE plpgsql;

Bind the function to a trigger:

CREATE TRIGGER tr_insert_livedata
  BEFORE INSERT ON livedata
  FOR EACH ROW EXECUTE PROCEDURE livedata_insert_function();

CREATE TRIGGER tr_insert_livedata

BEFORE INSERT ON livedata

FOR EACH ROW EXECUTE PROCEDURE livedata_insert_function();

Up to this point we have prepared the partitions and the insert trigger, following the standard way. As soon as the function is bound to a trigger, the mother table inserts will land in the child tables, and will be delivered from there. The existing data will be delivered from the mother table.

INSERT INTO livedata(some_data, part_value) VALUES ('new_one',1);
SELECT * FROM livedata_part_1;

1 2	INSERT INTO livedata(some_data, part_value) VALUES ('new_one',1); SELECT * FROM livedata_part_1;

Should show the new entry in the partition table, as expected.

Migration through UPDATE

Now, let's migrate the data from the mother table. Our logic uses an UPDATE trigger. For the controlled migration we use a value to update and for this we choose to add an extra BOOL column without default values. This column can also be handy for monitoring the migration process. The application/client can be unaware of it's existence. Without default values and indices this modification runs in 10-100ms even if the table is huge.

ALTER TABLE livedata ADD COLUMN in_partition BOOL;

1	ALTER TABLE livedata ADD COLUMN in_partition BOOL;

BEFORE UPDATE Trigger

Now let's add a BEFORE UPDATE trigger, so that it copies the values from the mother table to the partition and then performs an update on the copied value, setting the in_partition flag on the updated row. This will be our migration function. The migration will simply be an update on the rows on the mother table setting the in_partition flag to true.

 CREATE OR REPLACE FUNCTION migrate_data() RETURNS trigger AS $
  DECLARE
        partition_table TEXT;
  BEGIN -- the trigger table name is the suffix for the partition table
        partition_table := TG_TABLE_NAME || '_part_' || NEW.part_value;

  EXECUTE format('DELETE FROM %s WHERE id = %s',TG_TABLE_NAME,NEW.id);
  EXECUTE format('INSERT INTO %s (id, some_data, part_value, in_partition)
                    VALUES (%s,%L,%s,TRUE)'
                   ,partition_table, NEW.id,NEW.some_data,NEW.part_value);
 RETURN NULL;
 END;
 $ LANGUAGE plpgsql;

DROP TRIGGER IF EXISTS tr_migrate_data_on_update ON livedata;
CREATE TRIGGER tr_migrate_data_on_update BEFORE UPDATE ON livedata
  FOR EACH ROW EXECUTE PROCEDURE migrate_data();

CREATE OR REPLACE FUNCTION migrate_data() RETURNS trigger AS $

DECLARE

partition_table TEXT;

BEGIN -- the trigger table name is the suffix for the partition table

partition_table := TG_TABLE_NAME || '_part_' || NEW.part_value;

EXECUTE format('DELETE FROM %s WHERE id = %s',TG_TABLE_NAME,NEW.id);

EXECUTE format('INSERT INTO %s (id, some_data, part_value, in_partition)

VALUES (%s,%L,%s,TRUE)'

,partition_table, NEW.id,NEW.some_data,NEW.part_value);

RETURN NULL;

END;

$ LANGUAGE plpgsql;

DROP TRIGGER IF EXISTS tr_migrate_data_on_update ON livedata;

CREATE TRIGGER tr_migrate_data_on_update BEFORE UPDATE ON livedata

FOR EACH ROW EXECUTE PROCEDURE migrate_data();

The migration happens when there is any update on a not yet migrated row.

UPDATE livedata SET some_data = 'new_value' WHERE id = 1;

1	UPDATE livedata SET some_data = 'new_value' WHERE id = 1;

To migrate the values through some migration process, we use the in_partition BOOL flag. Eg. doing a batch migration by part_value simply execute the following:

UPDATE livedata SET in_partition = true WHERE part_value = 1;

1	UPDATE livedata SET in_partition = true WHERE part_value = 1;

The race conditions are going to be handled by the database itself. The application code requires no change, all the migration happens inside the database. The data is not duplicated during the process and the integrity is maintained by the database, and requires very little extra code. To check if the migration is finished, either the size of the source table should be checked using the PostgreSQL internals, or adding the following to the insert_trigger() directly after the IF ... BETWEEN statement we can set the in_partition attribute to true on each insert.

NEW.in_partition := true;

1	NEW.in_partition := true;

In this case there is no need to set this attribute to true in the migrate_data(). Using this logic if the following query returns no rows the source table is fully migrated:

SELECT id FROM livedata WHERE in_partition IS NULL;

1	SELECT id FROM livedata WHERE in_partition IS NULL;

3 responses to “Simple partitioning of live data with Postgres”

Cat Cat says:

June 3, 2023 at 3:54 pm

From the small amount I have understood and tried so far, I believe you can't have livedata be a non-partitioned table originally and then have it act as a partitioned table for the purpose of the migration and final use. When itis created it either needs to be a partitioned or non-partitioned table.. you can't ALTER TABLE PARTITION BY TYPE (COLUMN) to an existing table...
Am I missing something? Am I wrong? Can you point me to how to use your livedata table as a non-partitioned table and then have the child tables act as partitions for it?

Reply
- laurenz says:
  
  June 5, 2023 at 6:40 am
  
  I would say that the article is talking about inheritance partitioning, which is a possible, but outdated solution these days.
  
  Reply
Cat Cat says:

June 5, 2023 at 5:08 pm

I see. Thanks for the clarification.

Reply

Simple partitioning of live data with Postgres

The usual way can be summarized the following:

Disadvantage of this method

Another solution

Preparation for partitioning

Now, add the insert trigger to the livedata table to forward the new inserts to the proper child tables:

Migration through UPDATE

BEFORE UPDATE Trigger

3 responses to “Simple partitioning of live data with Postgres”

Leave a Reply Cancel reply

CYBERTEC Guest

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts