Managing PostgreSQL partitions with the pg_partman extension
PostgreSQL table partitioning provides a framework for high-performance handling of data input and reporting. Use partitioning for databases that require very fast input of large amounts of data. Partitioning also provides for faster queries of large tables. Partitioning helps maintain data without impacting the database instance because it requires less I/O resources.
Partitioning allows you to split data into custom-sized chunks for processing. For example, you can choose to partition time-series data for ranges such as hourly, daily, weekly, monthly, quarterly, yearly, custom, or any combination of these. For a time-series data example, if you partitioned the table by hour, each partition will contain one hour of data. If you partitioned the time-series table by day, the partitions will hold one day’s worth of data, and so on. The size of a partition is controlled by the partition key.
When you use an INSERT
or UPDATE
SQL command on a partitioned
table, the database engine routes the data to the appropriate partition. PostgreSQL
table
partitions that store the data are child tables of the main table.
During database query reads, the PostgreSQL optimizer examines the WHERE
clause of the query and if possible, directs the database scan to only the relevant
partitions.
Starting with version 10, PostgreSQL uses declarative partitioning to implement table partitioning This is also known as native PostgreSQL partitioning. Prior to PostreSQL version 10, users had to use triggers to implement partitions.
PostgreSQL table partitioning provides the following features:
-
Creation of new partitions at any time.
-
Variable partition ranges.
-
Detachable and re-attachable partitions using DDL.
For example, detachable partitions are useful for removing historical data from the main partition but keeping historical data for analysis.
-
New partitions inherit the parent database table properties including:
-
indexes
-
primary-keys, which must include the partition key column
-
foreign-keys
-
check-constraints
-
references
-
-
Creating indexes for the full table or each specific partition.
You can't alter the schema for an individual partition. However, you can alter the parent table (such as adding a new column), which propagates to partitions.
Topics
Overview of the PostgreSQL pg_partman extension
You can use the PostgreSQL pg_partman
extension to automate the creation
and maintenance of table partitions. See also PG Partition Managerpg_partman
documentation.
The pg_partman
extension is supported on Amazon RDS PostgreSQL engine
versions 12.5 and higher.
Instead of having to manually create each partition, you configure
pg_partman
with the following settings:
-
Table to be partitioned
-
Partition type
-
Partition key
-
Partition granularity
-
Partition pre-creation and management options
After you create a PostgreSQL partitioned table, you register it with
pg_partman
by calling the create_parent()
function, which
creates the necessary partitions based on the parameters you pass to the
function.
The pg_partman
extension also provides the
run_maintenance_proc()
function, which you can call on a scheduled
basis to automatically manage partitions. You can schedule this function to run
periodically (such as hourly) to ensure that the proper partitions are created as
needed. Partitions can be also be automatically dropped.
Enabling the pg_partman extension
If you have multiple databases inside the same DB instance for which you want to manage
partitions, you must enable the pg_partman
extension separately for each
database. To enable the pg_partman
extension for a specific database, create
the partition maintenance schema and then create the pg_partman
extension as
follows:
CREATE SCHEMA partman; CREATE EXTENSION pg_partman WITH SCHEMA partman;
You require the rds_superuser
privileges to create the
pg_partman
extension.
If you receive an error such as the following, grant the rds_superuser
privileges to the account or use your superuser account.
ERROR: permission denied to create extension "pg_partman" HINT: Must be superuser to create this extension.
To grant rds_superuser
privileges, connect with your superuser account and
run the following command:
GRANT rds_superuser TO
user-or-role
;
For the examples that show using the pg_partman
extension, we use the
following sample database table and partition. This database uses a partitioned table
based on a timestamp. A schema data_mart
contains a table named
events
with a column named created_at
. The following
settings are included in the events
table:
-
Primary keys
event_id
andcreated_at
, which must have the column used to guide the partition. -
A check constraint
ck_valid_operation
to enforce values for anoperation
table column. -
Two foreign keys, where one (
fk_orga_membership)
points to the external tableorganization
and the other (fk_parent_event_id
) is a self-referenced foreign key. -
Two indexes, where one (
idx_org_id
) is for the foreign key and the other (idx_event_type
) is for the event type.
The follow DDL statements create these objects, which will be automatically included on each partition:
CREATE SCHEMA data_mart; CREATE TABLE data_mart.organization ( org_id BIGSERIAL, org_name TEXT, CONSTRAINT pk_organization PRIMARY KEY (org_id) ); CREATE TABLE data_mart.events( event_id BIGSERIAL, operation CHAR(1), value FLOAT(24), parent_event_id BIGINT, event_type VARCHAR(25), org_id BIGSERIAL, created_at timestamp, CONSTRAINT pk_data_mart_event PRIMARY KEY (event_id, created_at), CONSTRAINT ck_valid_operation CHECK (operation = 'C' OR operation = 'D'), CONSTRAINT fk_orga_membership FOREIGN KEY(org_id) REFERENCES data_mart.organization (org_id), CONSTRAINT fk_parent_event_id FOREIGN KEY(parent_event_id, created_at) REFERENCES data_mart.events (event_id,created_at) ) PARTITION BY RANGE (created_at); CREATE INDEX idx_org_id ON data_mart.events(org_id); CREATE INDEX idx_event_type ON data_mart.events(event_type);
Configuring partitions using the create_parent() function
After you enable the pg_partman
extension, you use the
create_parent()
function to configure partitions inside the
partition maintenance schema. This example uses the events
table
example created in Enabling the pg_partman extension Configuring partition maintenance
using the run_maintenance_proc() function. Call the
create_parent()
function as follows:
SELECT partman.create_parent( p_parent_table => 'data_mart.events', p_control => 'created_at', p_type => 'native', p_interval=> 'daily', p_premake => 30);
The parameters are as follows:
-
p_parent_table
– The parent partitioned table. This table must already exist and be fully qualified including the schema. -
p_control
– The column on which the partitioning is to be based. The data type must be integer or time-based. -
p_type
– The type is either native or partman. You should typically use the native type for its performance improvements and flexibility. The partman type relies on inheritance. -
p_interval
– The time interval or integer range for each partition. Example values include daily, hourly, and so on. -
p_premake
– The number of partitions to create in advance to support new inserts.
For a complete description of the create_parent()
function, see
Creation Functionspg_partman
documentation.
Configuring partition maintenance using the run_maintenance_proc() function
You can run partition maintenance operations to automatically create new
partitions, detach partitions, or remove old partitions. Partition maintenance
relies on the run_maintenance_proc()
function of
pg_partman
and the pg_cron
extension, which initiates
an internal scheduler. The pg_cron
scheduler automatically executes SQL
statements, functions, and procedures defined in your databases.
The following example uses the events
table example created in Enabling the pg_partman extension Configuring partition maintenance
using the run_maintenance_proc() function to set partition maintenance
operations to run automatically.
-- Prerequisite: add
pg_cron
to theshared_preload_libraries
parameter in the DB instance's parameter group. CREATE EXTENSION pg_cron; UPDATE partman.part_config SET infinite_time_partitions = true, retention = '3 months', retention_keep_table=true WHERE parent_table = 'data_mart.events'; SELECT cron.schedule('@hourly', $$CALL partman.run_maintenance_proc()$$);
The example is described as follows:
-
Modify the parameter group associated with your DB instance and add
pg_cron
to theshared_preload_libraries
parameter value. This change requires a DB instance restart for it to take effect. For more information, see Modifying parameters in a DB parameter group. -
CREATE EXTENSION pg_cron;
– Run this command using an account that has therds_superuser
permissions. This enables thepg_cron
extension. For more information, see Scheduling maintenance with the PostgreSQL pg_cron extension. -
UPDATE partman.part_config
– Adjusts thepg_partman
settings for thedata_mart.events
table. -
SET
. . . – Configures thedata_mart.events
table.-
infinite_time_partitions = true,
– Configures the table to be able to automatically create new partitions without any limit. -
retention = '3 months',
– Configures the table to have a maximum retention of three months. -
retention_keep_table=true
– When the retention period is due, the table is not deleted automatically. Instead, partitions that are older than the retention period are only detached from the parent table.
-
-
SELECT cron.schedule
. . . – Apg_cron
function call that defines how often the scheduler will run thepg_partman
maintenance procedure,partman.run_maintenance_proc()
. For this example the procedure will run every hour.
For a complete description of the run_maintenance_proc()
function,
see Maintenance Functionspg_partman
documentation.