© Laurenz Albe 2024
Table of Contents
During a training session on tuning autovacuum, a participant asked me,
“Why is autovacuum_vacuum_scale_factor
set to 0.2?”
Rather than replying with a guess or a wisecrack (“Because that's the way somebody implemented it!” or “If you don't like it, submit a patch!”), I decided to use the opportunity to demonstrate the true power of open source. That was such a nice experience that I decided to share it with a wider audience.
Everybody who has children has learned to dread “why” questions. Very often, they are hard or impossible to answer, and after a while you learn that seeing an adult struggle for words is one of the reasons for children to ask them. Over the years, I have come to the conclusion that when children ask “why”, they usually mean something else; namely:
As a mathematician and computer engineer, I tend to take questions too literally. Knowing the above interpretation has helped me come up with more appropriate answers. Adults are not that much different from children. So when people start asking “why” instead of “how”, I try to figure out what they really mean. In the case I related in the introduction, the “why” was rather of the interested kind, which is why I decided to pursue the question.
I hadn't spent a lot of thought on the power of open source before I heard Simon Phipps speak on the European PGDay 2010. In this section, he is the giant on whose shoulders I am standing!
Many people think that the main advantage of open source software is that you don't have to pay a license fee. This can be a substantial advantage. However, the total cost of ownership is not zero when you run open source software:
The bigger benefits of open source software are not that you get it for free. Here are a few of the more salient advantages:
I decided to demonstrate the last point when I set out to investigate the motivation behind the default value of autovacuum_vacuum_scale_factor
with my pupils.
Since I work with the PostgreSQL source code a lot, I already had it on my machine. But it is not hard to get the source code. Appendix I of the PostgreSQL documentation points you to the PostgreSQL Wiki. There we learn how to clone the PostgreSQL source code repository. All you need is to install Git. Then you can run
1 |
git clone https://git.postgresql.org/git/postgresql.git |
This creates a directory postgresql
that contains the complete history of the PostgreSQL source code since 1996. Change into the directory, and you are ready to go.
Once we are in the directory that contains the PostgreSQL source repository, we can look at the commit messages. That is often enough to get an answer!
1 |
git log |
Since this is a very long list, Git invokes the pager. If you have a good pager, you can search the commit messages, so I searched for “autovacuum_vacuum_scale_factor”. I only found a single commit from 2020 that fixed something in the documentation. So this seems to be a dead end.
However, a commit message often is all you need to answer your question. Naturally, the usefulness of this approach depends on the quality of the commit messages. Nowadays, PostgreSQL's commit messages are usually excellent: they describe the problem solved and the approach taken, and they point you to the relevant discussion (more about that later). If you go further back in time, you may find less useful commit messages like “Row count patch from Bruce”. But those “fast and loose” days are long gone. Anyway, perhaps you may want to stop and think the next time you are about to write a commit message like “Fix a couple of bugs”. After all, a few years down the line, you may be grateful for a useful description of the mess you hacked up back then.
Since the commit log didn't turn up anything useful, we have to dig deeper. The first thing we need to do is to find the place in the source code that defines the default value. If you are not familiar with the PostgreSQL source, that can be complicated. But looking for the name of a parameter is not so hard:
1 |
git grep autovacuum_vacuum_scale_factor |
This gives us a fairly short list:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
doc/src/sgml/config.sgml: doc/src/sgml/config.sgml: autovacuum_vacuum_scale_factor (floating point) doc/src/sgml/config.sgml: autovacuum_vacuum_scale_factor doc/src/sgml/ref/create_table.sgml: doc/src/sgml/ref/create_table.sgml: autovacuum_vacuum_scale_factor, toast.autovacuum_vacuum_scale_factor doc/src/sgml/ref/create_table.sgml: autovacuum_vacuum_scale_factor src/backend/access/common/reloptions.c: "autovacuum_vacuum_scale_factor", src/backend/access/common/reloptions.c: {"autovacuum_vacuum_scale_factor", RELOPT_TYPE_REAL, src/backend/postmaster/autovacuum.c: * autovacuum_vacuum_scale_factor GUC variable. Ditto for analyze. src/backend/utils/misc/guc_tables.c: {"autovacuum_vacuum_scale_factor", PGC_SIGHUP, AUTOVACUUM, src/backend/utils/misc/postgresql.conf.sample:#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum src/bin/psql/tab-complete.c: "autovacuum_vacuum_scale_factor", src/bin/psql/tab-complete.c: "toast.autovacuum_vacuum_scale_factor", |
The “sgml” files are the documentation, which is part of the source code in PostgreSQL. It won't take long to look at the remaining occurrences. If you know what “GUC” stands for, you need not search. If not, you can look up that piece of PostgreSQL jargon in the acronym list in appendix L of the PostgreSQL documentation.
The relevant piece of code in guc_tables.c
is the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
struct config_real ConfigureNamesReal[] = { [...] { {"autovacuum_vacuum_scale_factor", PGC_SIGHUP, AUTOVACUUM, gettext_noop("Number of tuple updates or deletes prior to vacuum as a fraction of reltuples."), NULL }, &autovacuum_vac_scale, 0.2, 0.0, 100.0, NULL, NULL, NULL }, [...] |
Now we know where in the source code we can find the default value of 0.2 for autovacuum_vacuum_scale_factor. But how did it get there?
We can use “git blame
” to see where individual source lines originate:
1 |
git blame src/backend/utils/misc/guc_tables.c |
That shows us:
1 2 3 4 5 6 7 |
0a20ff54f5e6 (Tom Lane 2022-09-13 11:05:07 -0400 3938) {"autovacuum_vacuum_scale_factor", PGC_SIGHUP, AUTOVACUUM, 0a20ff54f5e6 (Tom Lane 2022-09-13 11:05:07 -0400 3939) gettext_noop("Number of tuple updates or deletes prior to vacuum as a fraction of reltuples."), 0a20ff54f5e6 (Tom Lane 2022-09-13 11:05:07 -0400 3940) NULL 0a20ff54f5e6 (Tom Lane 2022-09-13 11:05:07 -0400 3941) }, 0a20ff54f5e6 (Tom Lane 2022-09-13 11:05:07 -0400 3942) &autovacuum_vac_scale, 0a20ff54f5e6 (Tom Lane 2022-09-13 11:05:07 -0400 3943) 0.2, 0.0, 100.0, 0a20ff54f5e6 (Tom Lane 2022-09-13 11:05:07 -0400 3944) NULL, NULL, NULL |
What was commit 0a20ff54f5e6? We could look it up on the Git web interface, but on the command line, we simply use
1 |
git show 0a20ff54f5e6 |
We see the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
commit 0a20ff54f5e66158930d5328f89f087d4e9ab400 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Tue Sep 13 11:05:07 2022 -0400 Split up guc.c for better build speed and ease of maintenance. guc.c has grown to be one of our largest .c files, making it a bottleneck for compilation. It's also acquired a bunch of knowledge that'd be better kept elsewhere, because of our not very good habit of putting variable-specific check hooks here. Hence, split it up along these lines: * guc.c itself retains just the core GUC housekeeping mechanisms. * New file guc_funcs.c contains the SET/SHOW interfaces and some SQL-accessible functions for GUC manipulation. * New file guc_tables.c contains the data arrays that define the built-in GUC variables, along with some already-exported constant tables. [...] |
So that was just a commit that refactored the code; prior to that commit, the code was in guc.c
. To proceed, we travel back in time to the commit before that:
1 |
git checkout 0a20ff54f5e6~1 |
Let's use git blame
again:
1 |
git blame src/backend/utils/misc/guc.c |
This time we see
1 2 3 4 5 6 7 |
29094193f526 (Tom Lane 2005-07-14 05:13:45 +0000 3885) {"autovacuum_vacuum_scale_factor", PGC_SIGHUP, AUTOVACUUM, 29094193f526 (Tom Lane 2005-07-14 05:13:45 +0000 3886) gettext_noop("Number of tuple updates or deletes prior to vacuum as a fraction of reltuples."), 29094193f526 (Tom Lane 2005-07-14 05:13:45 +0000 3887) NULL 29094193f526 (Tom Lane 2005-07-14 05:13:45 +0000 3888) }, 29094193f526 (Tom Lane 2005-07-14 05:13:45 +0000 3889) &autovacuum_vac_scale, 2594cf0e8c04 (Tom Lane 2011-04-07 00:11:01 -0400 3890) 0.2, 0.0, 100.0, 2594cf0e8c04 (Tom Lane 2011-04-07 00:11:01 -0400 3891) NULL, NULL, NULL |
Rinse and repeat: let's look at commit 2594cf0e8c04
1 |
git show 2594cf0e8c04 |
This is another commit that changed the API, not the values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
commit 2594cf0e8c04406ffff19b1651c5a406d376657c Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Thu Apr 7 00:11:01 2011 -0400 Revise the API for GUC variable assign hooks. [...] diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 2151fde3618..5e4904aeb7f 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c [...] @@ -2297,7 +2479,8 @@ static struct config_real ConfigureNamesReal[] = NULL }, &autovacuum_vac_scale, - 0.2, 0.0, 100.0, NULL, NULL + 0.2, 0.0, 100.0, + NULL, NULL, NULL }, { {"autovacuum_analyze_scale_factor", PGC_SIGHUP, AUTOVACUUM, |
Again, we travel back in time to the commit prior to that:
1 |
git checkout 2594cf0e8c04~1 |
Using git blame src/backend/utils/misc/guc.c
again, we wee
1 2 3 4 5 6 |
29094193f526 (Tom Lane 2005-07-14 05:13:45 +0000 2295) {"autovacuum_vacuum_scale_factor", PGC_SIGHUP, AUTOVACUUM, 29094193f526 (Tom Lane 2005-07-14 05:13:45 +0000 2296) gettext_noop("Number of tuple updates or deletes prior to vacuum as a fraction of reltuples."), 29094193f526 (Tom Lane 2005-07-14 05:13:45 +0000 2297) NULL 29094193f526 (Tom Lane 2005-07-14 05:13:45 +0000 2298) }, 29094193f526 (Tom Lane 2005-07-14 05:13:45 +0000 2299) &autovacuum_vac_scale, e0938c3f5b03 (Bruce Momjian 2006-09-02 23:12:16 +0000 2300) 0.2, 0.0, 100.0, NULL, NULL |
This time, we have hit gold; commit e0938c3f5b03 is the one we were looking for:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
commit e0938c3f5b03e48bca32fe903f057e5777d43df8 Author: Bruce Momjian <bruce@momjian.us> Date: Sat Sep 2 23:12:16 2006 +0000 Make autovacuum behavior more agressive, per discussion on hackers list --- was part of autovacuum default 'on' patch that was reverted, but we want this part. Peter Eisentraut [...] diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index fb12bd7adfc..f985c9da423 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c [...] @@ -1738,7 +1738,7 @@ static struct config_real ConfigureNamesReal[] = NULL }, &autovacuum_vac_scale, - 0.4, 0.0, 100.0, NULL, NULL + 0.2, 0.0, 100.0, NULL, NULL }, { {"autovacuum_analyze_scale_factor", PGC_SIGHUP, AUTOVACUUM, |
So in 2006, autovacuum_vacuum_scale_factor
got reduced from 0.4 to 0.2.
In order to understand the motivation behind the change, it is a good idea to read the mailing list discussion. The PostgreSQL project archives the mailing lists, so we should be able to find the discussion. Note that finding the discussion has become easier today, because it has become a habit to reference the relevant mailing list discussion in the commit message.
I used the advanced archive search for the best results. I narrowed the search down to the “pgsql-hackers” mailing list, searched for “autovacuum aggressive”, set “Date: anytime” and “Sort by: Reverse date”. This produced plenty of results, but on the second page I already found this thread from August 2006, which must be the correct one.
To sum up the arguments: Rod Taylor argued that the scale factor should become more aggressive and suggested a value of 0.1. Matthew T. O'Connor and Jim C. Nasby supported that, and Jim suggested a value of 0.2. ITAGAKI Takahiro suggested a value below 0.1 to reduce index page splits. Peter Eisentraut proposed a value of 0.08, which seemed too big a jump to Matthew T. O'Connor. Peter suggested 0.2 as a compromise. Josh Berkus considered that too aggressive and wanted to stay on 0.4. Finally Bruce Momjian settled the dispute by simply committing Peter's proposal.
To give you some historical context: Before version 8.1 (released in 2005), autovacuum was a “contrib module” (there were no extensions back then). Version 8.3 from 2008 enabled autovacuum by default. So these were the times when autovacuum was still somewhat experimental, and many people still used cron
jobs to schedule regular VACUUM
runs. These were the bad old days indeed.
In PostgreSQL, both the source code and the discussions that lead the developers to a certain decision are there for everyone to see. I demonstrated this by uncovering the reasoning behind the current default value for autovacuum_vacuum_scale_factor
. With open source software like PostgreSQL, you are not reduced to being a user. If you feel that 18 years of experience and the changing requirements are enough to suggest a change, join the hackers mailing list and make your point. Welcome to the community!
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply