We will share a simple approach to migrate existing tables to partitioned versions of them with Postgresql.
The partitions will be created on demand with just a few lines of codes. It is useful during off-line migration, and can be used in production with applications that don’t do so much INSERT.
Let’s say you want to partition a table based on time period.
In Postgres, we handle partitioning with inheritance. You will have a master table, used to regroup the different partitions, and one table per partition. I let you read the excellent Postgres documentation about this: http://www.postgresql.org/docs/current/static/ddl-partitioning.html
As you can see, there is a lot of tedious work to create partitions, tables, indexes… Here we provide you a non optimal solution in terms of insert performance, but definitely more practical in terms of risks and time to settle than the method presented in the manual.
I will show you how to conduct it in a simple case and simplify a bit the manual operation you need to do. From this, you’ll be able to build your own solution.
We got a large fact table called tablename. There is a column on this table called period which can have values like 2011_01 which means January of 2011.
We will create one partition per period. So we will create the tables:
The p at the end helps us detecting that this object is a partition. I admit there is a large space to improve it.
Create the master table
-- Create the master table tablename_with_partition: -- You should use the same exact schema as the one from the non partitioned table. CREATE TABLE tablename_with_partition ( id integer NOT NULL DEFAULT nextval('tablename'::regclass), "name" character varying(150) NOT NULL, description text NOT NULL, period character varying(10) NOT NULL CONSTRAINT tablename_with_partition_pkey PRIMARY KEY (id) ) WITH ( OIDS=FALSE);
Auto partition creation during insert
-- Attach a magic function to the insert of this table: CREATE OR REPLACE FUNCTION create_partition_and_insert() RETURNS trigger AS $BODY$ DECLARE partition VARCHAR(25); BEGIN partition := TG_RELNAME || '_' || NEW.period || ‘p’; IF NOT EXISTS(SELECT relname FROM pg_class WHERE relname=partition) THEN RAISE NOTICE 'A partition has been created %',partition; EXECUTE 'CREATE TABLE ' || partition || ' (check (period = ''' || NEW.period || ''')) INHERITS (' || TG_RELNAME || ');'; END IF; EXECUTE 'INSERT INTO ' || partition || ' SELECT(' || TG_RELNAME || ' ' || quote_literal(NEW) || ').*;'; RETURN NULL; END; $BODY$ LANGUAGE plpgsql VOLATILE COST 100;
This function will try to insert in the correct partition and creates it if it doesn’t exist. This approach is not the most efficient, but if you got less than 20M records, it works definitely fine.
Here we attach the function defined earlier to the insert operation on tablename_with_partitions:
CREATE TRIGGER tablename_insert_trigger BEFORE INSERT ON tablename_wtih_partition FOR EACH ROW EXECUTE PROCEDURE create_partition_and_insert();
Notice that I didn’t named the trigger with the name “tablename_with_partitions_insert_trigger”. It is because I planned to substitute the partitionned master table and the old non partionned version later on.
Migrate the data
-- Copy data from the non partitioned to the partitioned version: INSERT INTO tablename_with_partition SELECT * tablename;
It can take a while. But it is a nice approach to do this way.
Testing everything is fine
Let’s check everything is ok:
– “SELECT count(*) FROM tablename_2010_1;”: you got proper data in one partition.
– “SELECT count(*) FROM tablename_with_partition;” returns the sum of all the elements in the partition
– “SELECT count(*) FROM ONLY tablename_with_partition;” returns 0. It is normal we didn’t insert one line of data in the master table.
What’s next ?
If everything is fine, we are almost done. We’ll need to build the indexes and substitute tablename by tablename_with_partition.
We’ll see this in the next article by creating a smart function that can execute on every partition an arbitrary SQL command.