Table partitioning in PostgreSQL is a great way to speed up queries that require particular portions of very large data sets. The standard example is business intelligence reporting. Lots of data goes into the database, but generally your queries only work with one month's data. As the data set grows, you want to avoid the indefinite lengthening of your query times for reporting. By using table partitioning and PostgreSQL's constraint_exclusion, you can be sure that your queries are only scanning the data that is appropriate.
The concept for table partitions is to make one parent table, and a number (probably ever-growing) of child tables that actually hold your data. There are plenty of tutorials out there for creating these tables, indexes, functions, and triggers, so I will not go into great detail on how/why partitioning works, but I will offer a quick synopsis.
Child tables are created with CHECK constraints that ensure only particular data is stored in each Child table. In the reporting example, the Parent and Child tables would have a CREATE statements as below.
After creating the tables, create a function that returns a trigger. The function simply redirects the values to the appropriate child table (usually, you can do more if necessary). Finally, create the trigger to use the function whenever a row is inserted in the parent table. This looks something like:
This is not terribly difficult. Now, when an insert occurs on parent, it will automatically get passed through to the appropriate child table. If it is outside of the scope of the existing child table(s), an exception is raised in the previous sample. Note that if you are using PostgreSQL older than 8.4, you will need to manually turn on "constraint_exclusion" in your postgresql.conf file. Otherwise, all of your efforts to optimize will be in vain. 8.4 and newer changes "constraint_exclusion" from a bool to an enum, and defaults it to "partition", which is sufficient for optimizing queries against parent/child partitioned table schemes.
The automation problem is that putting this together over and over again is a giant pain. Not to mention the fact that when I set out to find a script/function/procedure that could accomplish it for me, I did not uncover much. So, when I put this together, I decided to make it as generic as possible, so that I could reuse it for all of my table partitioning needs. As I was dusting it off recently to put together a test database with massive partitioning, I decided to clean it up and post it here.
update_partitions(begin_time, schema_name, primary_table_name, table_owner, date_column)
Thanks to Javier for adding support for multiple intervals, and not just monthly partitions:
Notes:
- The function is created in the public schema and is owned by user postgres.
- The function takes params:
- begin_time - time of your earliest month's data. This allows for backfilling and for reducing trigger function overhead by avoiding legacy date logic.
- schema_name - name of the schema that contains the parent table. Child tables are created here.
- primary_table_name - name of the parent table. This is used to generate monthly tables ([primary_table_name]_YYYYMM) and an unknown table ([primary_table_name]_unknowns). It is also used in the trigger and trigger function names.
- table_owner - name of PostgreSQL Role to be assigned as owner of the child tables.
- date_column - name of the timestamp/date column that is used for check constraints and insert trigger function.
- The insert trigger function is recreated everytime you run.
- If tables already exist, the function simply updates the trigger function and moves to the next table in the series.
- This function does not raise exceptions when errant data is encountered. Any data that does not have a matching child table is stored in the unknowns table.
- The function returns the number of tables that it created.
Sample Usage:
Backfill to start a group of child tables for optimizing an existing data set. Creates monthly tables that inherit the 'partition_test' table and are named 'partition_test_200908', 'partition_test_200909', ... , 'partition_test_201101' (given that today's date is 2010-12-29). Check constraints are generated for the child tables, a trigger insert function is created, and a trigger is applied to the parent table.
SELECT public.update_partitions('2009-08-01'::timestamp,'public','partition_test','postgres','date_column');
If you have a large number of tables (dating back to 200908, for example), but you only insert current data, you may want to backfill and then immediately run an update_partitions using a more current begin_date. This will reduce execution time for your trigger insert function. It will not delete older tables, and since no older data is being inserted, you need not worry about inconsistency between your trigger insert function and the various child table's check constraints. This will update your trigger function to check for dates in 201011, 201012, and 201101; instead of checking for the entire span of months as the previous example would.
SELECT public.update_partitions('2010-11-01'::timestamp,'public','partition_test','postgres','date_column');
Add a crontab entry that calls this function once a month, using either the origin date (if your usage requires the ability to enter legacy data at any time) or using this month/last month as a begin_time (if legacy data is never entered into these tables).
Some Simple Hacks:
- This could pretty easily be modified to use non-date constraints.
- I intentionally avoided any table dropping efforts, but dropping tables with constraints before the begin_time might be useful for your data.
- I prefer to update periodically, but if you have a discrete interval, an end_time could replace the currentMonth+1 value.
- Adding primary keys. I almost added a default 'id' primary key addition, but since this isn't universal, I left the primary keys out. You will probably want to add a primary key to your child tables.
- and...?
I'm sure there are some more elegant ways to accomplish some of these actions. Nonetheless, this works for me, and I hope it saves you some time. If anyone has suggestions for updates or alternatives, don't hesitate to comment.
Hi,
Your function looks very well. Unfortunatelly my database is designed so some tables and fields have capital letters. Therefore they must be written as "CaPiTaLnAmE" no CaPiTaLnAmE. For example my timestamp field is "orderTime".
I tried to modify your function but with no luck. Actually I don't know plsql. Could you rewrite the function to support capital letters in schema, table and field names?
Thanks a lot!
@Jack - Thanks for pointing this out, I completely ignored this possibility when testing the script.
I will try to get an updated version together that addresses this soon.
Yes it looks great. I've been looking for something like this. So first I would like to thank you for your effort.
Any way, the blog presentation is corrupted at HTML line 163, just in the middle of the "pre" tag that contains the plsql script (just after the line startTime:=intervalTime;) so it is quite difficult to understand from this point... and no way for copy
Would you please fix this presentation bug so we all could easyly use your script?
Thanks in advance.
Moved it github, and the scripts there are working for me. Sorry for the confusion, and thanks for posting your concerns!
Thanks Nicholas, it's great. I've got it working on my systemevents of rsyslog. It took me half an hour... and because I know litte of pgsql and wanted to check it twice..
Thanks again.
@Javier - That's great, and thanks for the update to handle additional intervals, it's updated in the gist, and I've noted above.
Thanks a lot!!!, really appreciate your effort and the fine documentation. Worked perfectly right out of the box.
I've added also indexes auto creation
just replace the date_column index creation with the fallowing code:
-- create indexes
FOR index_record IN select indexname, indexdef from pg_indexes where tablename = primary_table_name LOOP
EXECUTE replace(
replace(index_record.indexdef,
'INDEX ' || primary_table_name,
'INDEX ' || fullTablename
),
'ON ' || primary_table_name,
'ON ' || fullTablename
) from pg_indexes where indexname = index_record.indexname
;
END LOOP;
@Shahar - I think we need to be a little careful with that, as replacing the current index creation would put the index replacement/build within the loop. You only get there if your table doesn't already exist (which might be intended), I think...
Also, since you're pulling index_record from pg_indexes, don't you still need to create the index the first time around (meaning we can't replace the date_column index creation, but rather append)?
I haven't looked at this in several months, and I'm a little tied up at the moment, apologies if I'm misunderstanding your update. Also, note that my comments system mangles code with angle brackets, can you reference a gist or use email? This domain at gmail will get you to me, or you can use github.
Thanks for the update, once I get my head around it and run some tests, I'll post credit and modify the code here!
tnx Nicholas for the quick response,
obviously I've wrote this for me needs: I have an existing table that I want to start to partition. so indexes already exists.... I'm trying to explore now how to auto regenerate the foreign constrains from the primary table to the partitions...
Nicholas,
here is the gist:
https://gist.github.com/3070823
NOTICE I've also add constrains generation.
SEE the two remarks:
AUTO GENERATE CONSTRAINTS
AUTO GENERATE INDEXES
Hi Nicholas,
I've also add constraints generation. here is the gist:
https://gist.github.com/3070823
look at comments:
AUTO GENERATE CONSTRAINTS
AUTO GENERATE INDEXES
another update:
1. same gist: https://gist.github.com/3070823
2. parameter was added: is_first_contains_prior BOOLEAN - do we want the first table to contain also rows from prior dates.
3. generating indexes and constrains also to the 'unknown' table - other wise we get table scan on that table.
3. create index for date column if doesn't exists in original table
@Shahar - Thanks for the updates, at a glance, I like the latest versions. I want to run a couple tests to be sure they comply with my instructions above, and I'll either post as is, or make some minor changes either to the scripts or my text to be sure it's easy to walk through usage.
Thanks again!
This is great. Small issue within Shahar version. If for some reason you have the same table within 2 diff db, yull hit both while looking for constraint an indexes..
That said, what to do with legacy data already in production. I have a table with 14 millions rows covering maintly the last 3 month.
In order the distribute the data in each weekly partition, do I have an alternative than :
1. rename de main table
2. duplicate it's structure
3. update_partition call
4. insert all row from backup into new version?
Last time I implemented this, this is what I had to do. can I avoid this?
Thanx
Adding a schemaname='something' to the pg_indexes check should resolve the issue with similarly named tables.
At a high level, because we're relying on triggers, re-inserting the data is a good way to achieve the data move (without the insert, no trigger). You could also perform SELECT INTO with timestamp constraints, but you're just duplicating the effort of the triggers at that point.
I haven't tried it, but another option would be to use Slony or Bucardo to set up a slave table somewhere and let it run the sync as a background process. I don't see any substantial benefit to this technique, other than not having to deal with large dump files. It is also a little bit of a pain to replicate to a table in the same database.