Tuning PostgreSQL autovacuum

08.2020 | Category: Performance | Tags: administration, vacuum

08.2020

tuning autovacuum by hiring more workers — © Laurenz Albe 2020

Table of Contents

In many PostgreSQL databases, you never have to think or worry about tuning autovacuum. It runs automatically in the background and cleans up without getting in your way.

But sometimes the default configuration is not good enough, and you have to tune autovacuum to make it work properly. This article presents some typical problem scenarios and describes what to do in these cases.

The many tasks of autovacuum

There are many autovacuum configuration parameters, which makes tuning complicated. The main reason is that autovacuum has many different tasks. In a way, autovacuum has to fix all the problems arising from PostgreSQL's Multiversioning Concurrency Control (MVCC) implementation:

clean up “dead tuples” left behind after UPDATE or DELETE operations
update the free space map that keeps track of free space in table blocks
update the visibility map that is required for index-only scans
“freeze” table rows so that the transaction ID counter can safely wrap around
schedule regular ANALYZE runs to keep the table statistics updated

Depending on which of these functionalities cause a problem, you need different approaches to tuning autovacuum.

Tuning autovacuum for dead tuple cleanup

The best-known autovacuum task is cleaning up of dead tuples from UPDATE or DELETE operations. If autovacuum cannot keep up with cleaning up dead tuples, you should follow these three tuning steps:

Make sure that nothing keeps autovacuum from reclaiming dead tuples

Check the known reasons that keep vacuum from removing dead tuples. Most often, the culprit are long running transactions. Unless you can remove these obstacles, tuning autovacuum will be useless.

If you cannot fight the problem at its root, you can use the configuration parameter idle_in_transaction_session_timeout to have PostgreSQL terminate sessions that stay “idle in transaction” for too long. That causes errors on the client side, but may be justified if you have no other way to keep your database operational. Similarly, to fight long-running queries, you can use statement_timeout.

Tuning autovacuum to run faster

If autovacuum cannot keep up with cleaning up dead tuples, the solution is to make it work faster. This may seem obvious, but many people fall into the trap of thinking that making autovacuum start earlier or run more often will solve the problem.

VACUUM is a resource-intensive operation, so autovacuum by default operates deliberately slowly. The goal is to have it work in the background without being in the way of normal database operation. But if your workload creates lots of dead tuples, you will have to make it more aggressive:

increase autovacuum_vacuum_cost_limit from its default value of 200 (this is the gentle method)
reduce autovacuum_vacuum_cost_delay from its default value of 2 (in older versions: 20!) milliseconds (this is the effective method)

Setting autovacuum_vacuum_cost_delay to 0 will make autovacuum as fast as a manual VACUUM – that is, as fast as possible.

Since not all tables grow dead tuples at the same pace, it is usually best not to change the global setting in postgresql.conf, but to change the setting individually for busy tables:

ALTER TABLE busy_table SET (autovacuum_vacuum_cost_delay = 1);

1	ALTER TABLE busy_table SET (autovacuum_vacuum_cost_delay = 1);

Partitioning a table can also help with getting the job done faster; see below for more.

Change the workload so that fewer dead tuples are generated

If nothing else works, you have to see that fewer dead tuples are generated. Perhaps several UPDATEs to a single row could be combined to a single UPDATE?

Often you can significantly reduce the number of dead tuples by using “HOT updates”:

set the fillfactor for the table to a value less than 100, so that INSERTs leave some free space in each block
make sure that no column that you modify in the UPDATE is indexed

Then any SELECT or DML statement can clean up dead tuples, and there is less need for VACUUM.

Tuning autovacuum for index-only scans

The expensive part of an index scan is looking up the actual table rows. If all columns you want are in the index, it should not be necessary to visit the table at all. But in PostgreSQL you also have to check if a tuple is visible or not, and that information is only stored in the table.

To work around that, PostgreSQL has a “visibility map” for each table. If a table block is marked as “all visible” in the visibility map, you don't have to visit the table for the visibility information.

So to get true index-only scans, autovacuum has to process the table and update the visibility map frequently. How you configure autovacuum for that depends on the kind of data modifications the query receives:

Tuning autovacuum for index-only scans on tables that receive `UPDATE`s or `DELETE`s

For that, you reduce autovacuum_vacuum_scale_factor for the table, for example

ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.01);

1	ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.01);

It may be a good idea to also speed up autovacuum as described above.

Tuning autovacuum for index-only scans on tables that receive only `INSERT`s

This is simple from v13 on: tune autovacuum_vacuum_insert_scale_factor as shown above for autovacuum_vacuum_scale_factor.

For older PostgreSQL versions, the best you can do is to significantly lower autovacuum_freeze_max_age. The best value depends on the rate at which you consume transaction IDs. If you consume 100000 transaction IDs per day, and you want the table to be autovacuumed daily, you can set

ALTER TABLE insert_only SET (autovacuum_freeze_max_age = 100000);

1	ALTER TABLE insert_only SET (autovacuum_freeze_max_age = 100000);

To measure the rate of transaction ID consumption, use the function txid_current() (or pg_current_xact_id() from v13 on) twice with a longer time interval in between and take the difference.

Tuning autovacuum to avoid transaction wraparound problems

Normally, autovacuum takes care of that and starts a special “anti-warparound” autovacuum worker whenever the oldest transaction ID in a table is older than autovacuum_freeze_max_age transactions or the oldest multixact is older than autovacuum_multixact_freeze_max_age transactions.

Make sure than anti-wraparound vacuum can freeze tuples in all tables

Again, you have to make sure that there is nothing that blocks autovacuum from freezing old tuples and advancing pg_database.datfrozenxid and pg_database.datminmxid. Such blockers can be:

very long running database sessions that keep a transaction open or have temporary tables (autovacuum cannot process temporary tables)
data corruption, which can make all autovacuum workers fail with an error

To prevent data corruption, use good hardware and always run the latest PostgreSQL minor release.

Tuning tables that receive `UPDATE`s or `DELETE`s for anti-wraparound vacuum

On tables that receive UPDATEs or DELETEs, all that you have to do is to see that autovacuum is running fast enough to get done in time (see above).

Tuning tables that receive only `INSERT`s for anti-wraparound vacuum

From PostgreSQL v13 on, there are no special considerations in this case, because you get regular autovacuum runs on such tables as well.

Before that, insert-only tables were problematic: since there are no dead tuples, normal autovacuum runs are never triggered. Then, as soon as autovacuum_freeze_max_age or autovacuum_multixact_freeze_max_age are exceeded, you may suddenly get a massive autovacuum run that freezes a whole large table, takes a long time and causes massive I/O.

To avoid that, reduce autovacuum_freeze_max_age for such a table:

ALTER TABLE insert_only SET (autovacuum_freeze_max_age = 10000000);

1	ALTER TABLE insert_only SET (autovacuum_freeze_max_age = 10000000);

Partitioning

With very big tables, it can be advisable to use partitioning. The advantage here is you can have several autovacuum workers working on several partitions in parallel, so that the partitioned table as a whole is done faster than a single autovacuum worker could.

If you have many partitions, you should increase autovacuum_max_workers, the maximum number of autovacuum workers.

Partitioning can also help with vacuuming tables that receive lots of updates, as long as the updates affect all partitions.

Tuning autoanalyze

Updating table statistics is a “side job” of autovacuum.

You know that automatic statistics collection does not happen often enough if your query plans get better after a manual ANALYZE of the table.

In that case, you can lower autovacuum_analyze_scale_factor so that autoanalyze processes the table more often:

ALTER TABLE mytable SET (autovacuum_analyze_scale_factor = 0.02);

1	ALTER TABLE mytable SET (autovacuum_analyze_scale_factor = 0.02);

An alternative is not to use the scale factor, but set autovacuum_analyze_threshold, so that table statistics are calculated whenever a fixed number of rows changes. For example, to configure a table to be analyzed whenever more than a million rows change:

ALTER TABLE mytable SET (
   autovacuum_analyze_scale_factor = 0,
   autovacuum_analyze_threshold = 1000000
);

1

2

3

4

ALTER TABLE mytable SET (

autovacuum_analyze_scale_factor = 0,

autovacuum_analyze_threshold = 1000000

);

Conclusion

Depending on your specific problem and your PostgreSQL version, there are different tuning knobs to make autovacuum do its job correctly. The many tasks of autovacuum and the many configuration parameters don't make that any easier.

If the tips in this article are not enough for you, consider getting professional consulting.

2 responses to “Tuning PostgreSQL autovacuum”

HeatZync says:

April 19, 2021 at 10:44 am

Hi Laurenz. Quick question: Does an insert that gets rolled back result in a dead tuple?

Reply
- laurenz says:
  
  April 19, 2021 at 10:54 am
  
  Yes, certainly.
  
  Reply

Tuning PostgreSQL autovacuum

The many tasks of autovacuum

Tuning autovacuum for dead tuple cleanup

Make sure that nothing keeps autovacuum from reclaiming dead tuples

Tuning autovacuum to run faster

Change the workload so that fewer dead tuples are generated

Tuning autovacuum for index-only scans

Tuning autovacuum for index-only scans on tables that receive `UPDATE`s or `DELETE`s

Tuning autovacuum for index-only scans on tables that receive only `INSERT`s

Tuning autovacuum to avoid transaction wraparound problems

Make sure than anti-wraparound vacuum can freeze tuples in all tables

Tuning tables that receive `UPDATE`s or `DELETE`s for anti-wraparound vacuum

Tuning tables that receive only `INSERT`s for anti-wraparound vacuum

Partitioning

Tuning autoanalyze

Conclusion

2 responses to “Tuning PostgreSQL autovacuum”

Leave a Reply Cancel reply

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Tuning PostgreSQL autovacuum

The many tasks of autovacuum

Tuning autovacuum for dead tuple cleanup

Make sure that nothing keeps autovacuum from reclaiming dead tuples

Tuning autovacuum to run faster

Change the workload so that fewer dead tuples are generated

Tuning autovacuum for index-only scans

Tuning autovacuum for index-only scans on tables that receive UPDATEs or DELETEs

Tuning autovacuum for index-only scans on tables that receive only INSERTs

Tuning autovacuum to avoid transaction wraparound problems

Make sure than anti-wraparound vacuum can freeze tuples in all tables

Tuning tables that receive UPDATEs or DELETEs for anti-wraparound vacuum

Tuning tables that receive only INSERTs for anti-wraparound vacuum

Partitioning

Tuning autoanalyze

Conclusion

2 responses to “Tuning PostgreSQL autovacuum”

Leave a Reply Cancel reply

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Tuning autovacuum for index-only scans on tables that receive `UPDATE`s or `DELETE`s

Tuning autovacuum for index-only scans on tables that receive only `INSERT`s

Tuning tables that receive `UPDATE`s or `DELETE`s for anti-wraparound vacuum

Tuning tables that receive only `INSERT`s for anti-wraparound vacuum