Working with PostgreSQL autovacuum on Amazon RDS
We strongly recommend that you use the autovacuum feature for PostgreSQL databases to maintain the health of your PostgreSQL DB instance. Autovacuum automates the start of the VACUUM and the ANALYZE commands. Autovacuum checks for tables that have had a large number of inserted, updated, or deleted tuples. Autovacuum then reclaims storage by removing obsolete data or tuples from the PostgreSQL database.
Autovacuum is enabled by default for all new Amazon RDS PostgreSQL DB instances, and the related autovacuum configuration parameters are appropriately set by default. Because our defaults are somewhat generic, you can benefit from tuning parameters to your specific workload. The following section can help you perform the needed autovacuum tuning.
Topics
- Allocating memory for autovacuum
- Reducing the likelihood of transaction ID wraparound
- Determining if the tables in your database need vacuuming
- Determining which tables are currently eligible for autovacuum
- Determining if autovacuum is currently running and for how long
- Performing a manual vacuum freeze
- Reindexing a table when autovacuum is running
- Other parameters that affect autovacuum
- Setting table-level autovacuum parameters
- Autovacuum logging
Allocating memory for autovacuum
One of the most important parameters influencing autovacuum performance is the maintenance_work_mem
maintenance_work_mem
When doing calculations to determine the maintenance_work_mem
-
The default unit is kilobytes (KB) for this parameter.
-
The
maintenance_work_mem
parameter works in conjunction with the autovacuum_max_workers
parameter. If you have many small tables, allocate more autovacuum_max_workers
and less maintenance_work_mem
. If you have large tables (say, larger than 100 GB), allocate more memory and fewer worker processes. You need to have enough memory allocated to succeed on your biggest table. Each autovacuum_max_workers
can use the memory you allocate. Thus, you should make sure the combination of worker processes and memory equal the total memory that you want to allocate.
In general terms, for large hosts set the maintenance_work_mem
GREATEST({DBInstanceClassMemory/63963136*1024},65536)
.
Reducing the likelihood of transaction ID wraparound
In some cases, parameter group settings related to autovacuum might not be aggressive
enough to prevent transaction ID wraparound. To address this, Amazon RDS for PostgreSQL
provides
a mechanism that adapts the autovacuum parameter values automatically. Adaptive
autovacuum parameter tuning is a feature for RDS for PostgreSQL. A detailed
explanation of TransactionID wraparound
Adaptive autovacuum parameter tuning is enabled by default for RDS PostgreSQL instances
with the dynamic parameter rds.adaptive_autovacuum
set to ON. We strongly
recommend that you keep this enabled. However, to turn off adaptive autovacuum parameter
tuning, set the rds.adaptive_autovacuum
parameter to 0 or OFF.
Transaction ID wraparound is still possible even when RDS tunes the autovacuum
parameters. We encourage you to implement an Amazon CloudWatch alarm for transaction
ID wraparound.
For more information, see the blog post Implement an early warning system for transaction ID wraparound in Amazon RDS for
PostgreSQL
With adaptive autovacuum parameter tuning enabled, RDS will begin adjusting autovacuum
parameters when the CloudWatch metric MaximumUsedTransactionIDs
reaches the
value of the autovacuum_freeze_max_age
parameter or 500,000,000, whichever is
greater.
RDS continues to adjust parameters for autovacuum if a table continues to trend toward transaction ID wraparound. Each of these adjustments dedicates more resources to autovacuum to avoid wraparound. RDS updates the following autovacuum-related parameters:
RDS modifies these parameters only if the new value makes autovacuum more aggressive.
The parameters are modified in memory on the DB instance. The values in the parameter
group
aren't changed. To view the current in-memory settings, use the PostgreSQL SHOW
Whenever RDS modifies any of these autovacuum parameters, it generates an event for
the
affected DB instance that is visible on the AWS Management Console (https://console.aws.amazon.com/rds/MaximumUsedTransactionIDs
CloudWatch metric returns below
the threshold, RDS resets the autovacuum related parameters in memory back to the
values
specified in the parameter group and generates another event corresponding to this
change.
Determining if the tables in your database need vacuuming
You can use the following query to show the number of unvacuumed transactions in a
database. The datfrozenxid
column of a database's pg_database
row
is a lower bound on the normal transaction IDs appearing in that database. This column
is
the minimum of the per-table relfrozenxid
values within the database.
SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY age(datfrozenxid) desc limit 20;
For example, the results of running the preceding query might be the following.
datname | age mydb | 1771757888 template0 | 1721757888 template1 | 1721757888 rdsadmin | 1694008527 postgres | 1693881061 (5 rows)
When the age of a database reaches 2 billion transaction IDs, transaction ID (XID)
wraparound occurs and the database becomes read-only. This query can be used to produce
a
metric and run a few times a day. By default, autovacuum is set to keep the age of
transactions to no more than 200,000,000 (autovacuum_freeze_max_age
A sample monitoring strategy might look like this:
-
Set the
autovacuum_freeze_max_age
value to 200 million transactions. -
If a table reaches 500 million unvacuumed transactions, that triggers a low-severity alarm. This isn't an unreasonable value, but it can indicate that autovacuum isn't keeping up.
-
If a table ages to 1 billion, this should be treated as an alarm to take action on. In general, you want to keep ages closer to
autovacuum_freeze_max_age
for performance reasons. We recommend you investigate using the recommendations that follow. -
If a table reaches 1.5 billion unvacuumed transactions, that triggers a high-severity alarm. Depending on how quickly your database uses transaction IDs, this alarm can indicate that the system is running out of time to run autovacuum. In this case, we recommend you resolve this immediately.
If a table is constantly breaching these thresholds, you need to modify your autovacuum parameters further. By default, using VACUUM manually (which has cost-based delays disabled) is more aggressive than using the default autovacuum, but it is also more intrusive to the system as a whole.
We recommend the following:
-
Be aware and enable a monitoring mechanism so that you are aware of the age of your oldest transactions.
For information on creating a process that warns you about transaction ID wraparound, see the AWS Database Blog post Implement an early warning system for transaction ID wraparound in Amazon RDS for PostgreSQL
. -
For busier tables, perform a manual vacuum freeze regularly during a maintenance window, in addition to relying on autovacuum. For information on performing a manual vacuum freeze, see Performing a manual vacuum freeze.
Determining which tables are currently eligible for autovacuum
Often, it is one or two tables in need of vacuuming. Tables whose
relfrozenxid
value is greater than the number of transactions in
autovacuum_freeze_max_age
are always targeted by autovacuum. Otherwise, if
the number of tuples made obsolete since the last VACUUM exceeds the "vacuum threshold",
the
table is vacuumed.
The autovacuum threshold
Vacuum-threshold = vacuum-base-threshold + vacuum-scale-factor * number-of-tuples
While you are connected to your database, run the following query to see a list of tables that autovacuum sees as eligible for vacuuming:
WITH vbt AS (SELECT setting AS autovacuum_vacuum_threshold FROM pg_settings WHERE name = 'autovacuum_vacuum_threshold') , vsf AS (SELECT setting AS autovacuum_vacuum_scale_factor FROM pg_settings WHERE name = 'autovacuum_vacuum_scale_factor') , fma AS (SELECT setting AS autovacuum_freeze_max_age FROM pg_settings WHERE name = 'autovacuum_freeze_max_age') , sto AS (select opt_oid, split_part(setting, '=', 1) as param, split_part(setting, '=', 2) as value from (select oid opt_oid, unnest(reloptions) setting from pg_class) opt) SELECT '"'||ns.nspname||'"."'||c.relname||'"' as relation , pg_size_pretty(pg_table_size(c.oid)) as table_size , age(relfrozenxid) as xid_age , coalesce(cfma.value::float, autovacuum_freeze_max_age::float) autovacuum_freeze_max_age , (coalesce(cvbt.value::float, autovacuum_vacuum_threshold::float) + coalesce(cvsf.value::float,autovacuum_vacuum_scale_factor::float) * c.reltuples) as autovacuum_vacuum_tuples , n_dead_tup as dead_tuples FROM pg_class c join pg_namespace ns on ns.oid = c.relnamespace join pg_stat_all_tables stat on stat.relid = c.oid join vbt on (1=1) join vsf on (1=1) join fma on (1=1) left join sto cvbt on cvbt.param = 'autovacuum_vacuum_threshold' and c.oid = cvbt.opt_oid left join sto cvsf on cvsf.param = 'autovacuum_vacuum_scale_factor' and c.oid = cvsf.opt_oid left join sto cfma on cfma.param = 'autovacuum_freeze_max_age' and c.oid = cfma.opt_oid WHERE c.relkind = 'r' and nspname <> 'pg_catalog' and ( age(relfrozenxid) >= coalesce(cfma.value::float, autovacuum_freeze_max_age::float) or coalesce(cvbt.value::float, autovacuum_vacuum_threshold::float) + coalesce(cvsf.value::float,autovacuum_vacuum_scale_factor::float) * c.reltuples <= n_dead_tup -- or 1 = 1 ) ORDER BY age(relfrozenxid) DESC LIMIT 50;
Determining if autovacuum is currently running and for how long
If you need to manually vacuum a table, you need to determine if autovacuum is currently running. If it is, you might need to adjust parameters to make it run more efficiently, or terminate autovacuum so you can manually run VACUUM.
Use the following query to determine if autovacuum is running, how long it has been running, and if it is waiting on another session.
If you are using Amazon RDS PostgreSQL 9.6+ or higher, use this query:
SELECT datname, usename, pid, state, wait_event, current_timestamp - xact_start AS xact_runtime, query FROM pg_stat_activity WHERE upper(query) LIKE '%VACUUM%' ORDER BY xact_start;
After running the query, you should see output similar to the following.
datname | usename | pid | state | wait_event | xact_runtime | query --------+----------+-------+--------+------------+-------------------------+-------------------------------------------------------------------------------------------------------- mydb | rdsadmin | 16473 | active | | 33 days 16:32:11.600656 | autovacuum: VACUUM ANALYZE public.mytable1 (to prevent wraparound) mydb | rdsadmin | 22553 | active | | 14 days 09:15:34.073141 | autovacuum: VACUUM ANALYZE public.mytable2 (to prevent wraparound) mydb | rdsadmin | 41909 | active | | 3 days 02:43:54.203349 | autovacuum: VACUUM ANALYZE public.mytable3 mydb | rdsadmin | 618 | active | | 00:00:00 | SELECT datname, usename, pid, state, wait_event, current_timestamp - xact_start AS xact_runtime, query+ | | | | | | FROM pg_stat_activity + | | | | | | WHERE query like '%VACUUM%' + | | | | | | ORDER BY xact_start; +
If you are using an Amazon RDS for PostgreSQL version less than 9.6, use the following query.
SELECT datname, usename, pid, waiting, current_timestamp - xact_start AS xact_runtime, query FROM pg_stat_activity WHERE upper(query) LIKE '%VACUUM%' ORDER BY xact_start;
After running the query, you should see output similar to the following.
datname | usename | pid | waiting | xact_runtime | query --------+----------+-------+---------+-------------------------+---------------------------------------------------------------------------------------------- mydb | rdsadmin | 16473 | f | 33 days 16:32:11.600656 | autovacuum: VACUUM ANALYZE public.mytable1 (to prevent wraparound) mydb | rdsadmin | 22553 | f | 14 days 09:15:34.073141 | autovacuum: VACUUM ANALYZE public.mytable2 (to prevent wraparound) mydb | rdsadmin | 41909 | f | 3 days 02:43:54.203349 | autovacuum: VACUUM ANALYZE public.mytable3 mydb | rdsadmin | 618 | f | 00:00:00 | SELECT datname, usename, pid, waiting, current_timestamp - xact_start AS xact_runtime, query+ | | | | | FROM pg_stat_activity + | | | | | WHERE query like '%VACUUM%' + | | | | | ORDER BY xact_start; +
Several issues can cause a long-running autovacuum session (that is, multiple days
long).
The most common issue is that your maintenance_work_mem
We recommend that you use the following formula to set the maintenance_work_mem
GREATEST({DBInstanceClassMemory/63963136*1024},65536)
Short running autovacuum sessions can also indicate problems:
-
It can indicate that there aren't enough
autovacuum_max_workers
for your workload. In this case, you need to indicate the number of workers. -
It can indicate that there is an index corruption (autovacuum crashes and restart on the same relation but make no progress). In this case, run a manual vacuum freeze verbose ___table___ to see the exact cause.
Performing a manual vacuum freeze
You might want to perform a manual vacuum on a table that has a vacuum process already running. This is useful if you have identified a table with an age approaching 2 billion transactions (or above any threshold you are monitoring).
The following steps are a guideline, and there are several variations to the process.
For example, during testing, suppose that you find that the maintenance_work_mem
maintenance_work_mem
To manually perform a vacuum freeze
-
Open two sessions to the database containing the table you want to vacuum. For the second session, use "screen" or another utility that maintains the session if your connection is dropped.
-
In session one, get the PID of the autovacuum session running on the table.
Run the following query to get the PID of the autovacuum session.
SELECT datname, usename, pid, current_timestamp - xact_start AS xact_runtime, query FROM pg_stat_activity WHERE upper(query) LIKE '%VACUUM%' ORDER BY xact_start;
-
In session two, calculate the amount of memory you need for this operation. In this example, we determine that we can afford to use up to 2 GB of memory for this operation, so we set
maintenance_work_mem
for the current session to 2 GB. set maintenance_work_mem='2 GB'; SET
-
In session two, issue a
vacuum freeze verbose
command for the table. The verbose setting is useful because, although there is no progress report for this in PostgreSQL currently, you can see activity.\timing on Timing is on. vacuum freeze verbose pgbench_branches;
INFO: vacuuming "public.pgbench_branches" INFO: index "pgbench_branches_pkey" now contains 50 row versions in 2 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "pgbench_branches_test_index" now contains 50 row versions in 2 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pgbench_branches": found 0 removable, 50 nonremovable row versions in 43 out of 43 pages DETAIL: 0 dead row versions cannot be removed yet. There were 9347 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Time: 2.765 ms
-
In session one, if autovacuum was blocking, you see in
pg_stat_activity
that waiting is "T" for your vacuum session. In this case, you need to terminate the autovacuum process as follows.SELECT pg_terminate_backend('the_pid');
-
At this point, your session begins. It's important to note that autovacuum restarts immediately because this table is probably the highest on its list of work. Initiate your
vacuum freeze verbose
command in session 2 and then terminate the autovacuum process in session 1.
Reindexing a table when autovacuum is running
If an index has become corrupt, autovacuum continues to process the table and fails. If you attempt a manual vacuum in this situation, you will receive an error message similar to the following:
mydb=# vacuum freeze pgbench_branches; ERROR: index "pgbench_branches_test_index" contains unexpected zero page at block 30521 HINT: Please REINDEX it.
When the index is corrupted and autovacuum is attempting to run against the table,
you
contend with an already running autovacuum session. When you issue a "REINDEX
To reindex a table when autovacuum is running on the table
-
Open two sessions to the database containing the table you want to vacuum. For the second session, use "screen" or another utility that maintains the session if your connection is dropped.
-
In session one, get the PID of the autovacuum session running on the table.
Run the following query to get the PID of the autovacuum session.
SELECT datname, usename, pid, current_timestamp - xact_start AS xact_runtime, query FROM pg_stat_activity WHERE upper(query) like '%VACUUM%' ORDER BY xact_start;
-
In session two, issue the reindex command.
\timing on Timing is on. reindex index pgbench_branches_test_index; REINDEX Time: 9.966 ms
-
In session one, if autovacuum was blocking, you see in
pg_stat_activity
that waiting is "T" for your vacuum session. In this case, you will need to terminate the autovacuum process.select pg_terminate_backend('the_pid');
-
At this point, your session begins. It's important to note that autovacuum restarts immediately because this table is probably the highest on its list of work. Initiate your command in session 2 and then terminate the autovacuum process in session 1.
Other parameters that affect autovacuum
The following query shows the values of some of the parameters that directly affect
autovacuum and its behavior. The autovacuum parameters
SELECT name, setting, unit, short_desc FROM pg_settings WHERE name IN ( 'autovacuum_max_workers', 'autovacuum_analyze_scale_factor', 'autovacuum_naptime', 'autovacuum_analyze_threshold', 'autovacuum_analyze_scale_factor', 'autovacuum_vacuum_threshold', 'autovacuum_vacuum_scale_factor', 'autovacuum_vacuum_threshold', 'autovacuum_vacuum_cost_delay', 'autovacuum_vacuum_cost_limit', 'vacuum_cost_limit', 'autovacuum_freeze_max_age', 'maintenance_work_mem', 'vacuum_freeze_min_age');
While these all affect autovacuum, some of the most important ones are:
Setting table-level autovacuum parameters
Autovacuum-related storage parameters
The following query shows which tables currently have table-level options in place.
SELECT relname, reloptions FROM pg_class WHERE reloptions IS NOT null;
An example where this might be useful is on tables that are much larger than the rest of your tables. Suppose that you have one 300-GB table and 30 other tables less than 1 GB. In this case, you might set some specific parameters for your large table so you don't alter the behavior of your entire system.
ALTER TABLE mytable set (autovacuum_vacuum_cost_delay=0);
Doing this disables the cost-based autovacuum delay for this table at the expense
of
more resource usage on your system. Normally, autovacuum pauses for
autovacuum_vacuum_cost_delay each time autovacuum_cost_limit is reached. You can find
more
details in the PostgreSQL documentation about cost-based vacuuming
Autovacuum logging
By default, the postgresql.log doesn't contain information about
the autovacuum process. You can see output in the PostgreSQL error log from the autovacuum
worker operations by setting the rds.force_autovacuum_logging_level
parameter.
Allowed values are disabled, debug5, debug4, debug3, debug2, debug1, info, notice,
warning, error, log, fatal,
and panic
. The default value is
disabled
because the other allowable values can add significant amount of
information to your logs.
We recommend that you set the value of the
rds.force_autovacuum_logging_level
parameter to warning
and that
you set the log_autovacuum_min_duration
parameter to a value from 1,000 to
5,000 milliseconds. If you set this value to 5,000, Amazon RDS writes any activity
to the log
that takes more than five seconds. It also shows "vacuum skipped" messages when application
locking is causing autovacuum to intentionally skip tables. If you are troubleshooting
a
problem and need more detail, you can use a different logging level value, such as
debug1
or debug3
. Use these debug parameters for a short period
of time because these settings produce extremely verbose content written to the error
log
file. For more information about these debug settings, see the PostgreSQL documentation
PostgreSQL allows the rds_superuser
account to view autovacuum sessions
in pg_stat_activity
. For example, you can identify and end an
autovacuum session that is blocking a command from running, or running slower than
a
manually issued vacuum command.