CYBERTEC PostgreSQL Logo

The power of open source in PostgreSQL

01.2025 / Category: / Tags:

Superhero Slonik - the power of open source in PostgreSQL
© Laurenz Albe 2024

During a training session on tuning autovacuum, a participant asked me,
“Why is autovacuum_vacuum_scale_factor set to 0.2?”
Rather than replying with a guess or a wisecrack (“Because that's the way somebody implemented it!” or “If you don't like it, submit a patch!”), I decided to use the opportunity to demonstrate the true power of open source. That was such a nice experience that I decided to share it with a wider audience.

The dreadful “why” question

Everybody who has children has learned to dread “why” questions. Very often, they are hard or impossible to answer, and after a while you learn that seeing an adult struggle for words is one of the reasons for children to ask them. Over the years, I have come to the conclusion that when children ask “why”, they usually mean something else; namely:

  • Tell me more about this! It is interesting, and I don't want to go to bed yet!
  • This is really annoying, and it should be different!

As a mathematician and computer engineer, I tend to take questions too literally. Knowing the above interpretation has helped me come up with more appropriate answers. Adults are not that much different from children. So when people start asking “why” instead of “how”, I try to figure out what they really mean. In the case I related in the introduction, the “why” was rather of the interested kind, which is why I decided to pursue the question.

The true power of open source

I hadn't spent a lot of thought on the power of open source before I heard Simon Phipps speak on the European PGDay 2010. In this section, he is the giant on whose shoulders I am standing!

Is cheapness the main reason for using open source software?

Many people think that the main advantage of open source software is that you don't have to pay a license fee. This can be a substantial advantage. However, the total cost of ownership is not zero when you run open source software:

  • you need to integrate the software into your landscape; at least set up monitoring and backup
  • you have to train people or buy external knowledge to configure, operate and maintain the software
  • you have to get support in case you run into problems you cannot solve yourself

The true benefits of open source software

The bigger benefits of open source software are not that you get it for free. Here are a few of the more salient advantages:

  • You don't have to open a service request to figure out what an error message means. You can search for it in the source code, and even if you are not a programmer, you'll usually understand the cause of the error better.
  • You can have as many test and development installations as you want without having to worry about compliance with the license you bought. What good is a Kubernetes cluster that allows you to create a database instance at the push of a button, if you need to perform a license compliance check first?
  • For support, you are not at the mercy of a single vendor. Others can have access to the source code and provide third-level support to you. This healthy competition is good for the customer.
  • If the vendor abandons the software, you are not left high and dry (think of Oracle Forms). If the software is sufficiently popular, someone else will probably pick it up and maintain it. At the very least, you can pay developers to keep the software alive.
  • The software quality is often better. If more eyes are on the software, they are more likely to find bugs and security vulnerabilities. This also leads to raised awareness on the side of the developers: I'm much more likely to build a quick and dirty hack into closed source software. But I'd think twice before submitting such a patch to PostgreSQL, where it would be driven out of town and stoned in public.
  • For knowledge about the internal workings of the software, you are not at the mercy of consultants who talked to someone at a conference who happens to know somebody who has inside information. Closed source software is often surrounded by a surprising amount of myths and rumors, and not all of them are true.

I decided to demonstrate the last point when I set out to investigate the motivation behind the default value of autovacuum_vacuum_scale_factor with my pupils.

Getting the source code

Since I work with the PostgreSQL source code a lot, I already had it on my machine. But it is not hard to get the source code. Appendix I of the PostgreSQL documentation points you to the PostgreSQL Wiki. There we learn how to clone the PostgreSQL source code repository. All you need is to install Git. Then you can run

This creates a directory postgresql that contains the complete history of the PostgreSQL source code since 1996. Change into the directory, and you are ready to go.

Searching the commit messages

Once we are in the directory that contains the PostgreSQL source repository, we can look at the commit messages. That is often enough to get an answer!

Since this is a very long list, Git invokes the pager. If you have a good pager, you can search the commit messages, so I searched for “autovacuum_vacuum_scale_factor”. I only found a single commit from 2020 that fixed something in the documentation. So this seems to be a dead end.

However, a commit message often is all you need to answer your question. Naturally, the usefulness of this approach depends on the quality of the commit messages. Nowadays, PostgreSQL's commit messages are usually excellent: they describe the problem solved and the approach taken, and they point you to the relevant discussion (more about that later). If you go further back in time, you may find less useful commit messages like “Row count patch from Bruce”. But those “fast and loose” days are long gone. Anyway, perhaps you may want to stop and think the next time you are about to write a commit message like “Fix a couple of bugs”. After all, a few years down the line, you may be grateful for a useful description of the mess you hacked up back then.

Spotting the right place in the open source code

Since the commit log didn't turn up anything useful, we have to dig deeper. The first thing we need to do is to find the place in the source code that defines the default value. If you are not familiar with the PostgreSQL source, that can be complicated. But looking for the name of a parameter is not so hard:

This gives us a fairly short list:

The “sgml” files are the documentation, which is part of the source code in PostgreSQL. It won't take long to look at the remaining occurrences. If you know what “GUC” stands for, you need not search. If not, you can look up that piece of PostgreSQL jargon in the acronym list in appendix L of the PostgreSQL documentation.

The relevant piece of code in guc_tables.c is the following:

Now we know where in the source code we can find the default value of 0.2 for autovacuum_vacuum_scale_factor. But how did it get there?

Traveling back in time in the open source repository

Back to 2022

We can use “git blame” to see where individual source lines originate:

That shows us:

What was commit 0a20ff54f5e6? We could look it up on the Git web interface, but on the command line, we simply use

We see the following:

So that was just a commit that refactored the code; prior to that commit, the code was in guc.c. To proceed, we travel back in time to the commit before that:

Back to 2011

Let's use git blame again:

This time we see

Rinse and repeat: let's look at commit 2594cf0e8c04

This is another commit that changed the API, not the values:

Again, we travel back in time to the commit prior to that:

Finding the open source code modification in 2006

Using git blame src/backend/utils/misc/guc.c again, we wee

This time, we have hit gold; commit e0938c3f5b03 is the one we were looking for:

So in 2006, autovacuum_vacuum_scale_factor got reduced from 0.4 to 0.2.

Finding the mailing list discussion

In order to understand the motivation behind the change, it is a good idea to read the mailing list discussion. The PostgreSQL project archives the mailing lists, so we should be able to find the discussion. Note that finding the discussion has become easier today, because it has become a habit to reference the relevant mailing list discussion in the commit message.

I used the advanced archive search for the best results. I narrowed the search down to the “pgsql-hackers” mailing list, searched for “autovacuum aggressive”, set “Date: anytime” and “Sort by: Reverse date”. This produced plenty of results, but on the second page I already found this thread from August 2006, which must be the correct one.

To sum up the arguments: Rod Taylor argued that the scale factor should become more aggressive and suggested a value of 0.1. Matthew T. O'Connor and Jim C. Nasby supported that, and Jim suggested a value of 0.2. ITAGAKI Takahiro suggested a value below 0.1 to reduce index page splits. Peter Eisentraut proposed a value of 0.08, which seemed too big a jump to Matthew T. O'Connor. Peter suggested 0.2 as a compromise. Josh Berkus considered that too aggressive and wanted to stay on 0.4. Finally Bruce Momjian settled the dispute by simply committing Peter's proposal.

To give you some historical context: Before version 8.1 (released in 2005), autovacuum was a “contrib module” (there were no extensions back then). Version 8.3 from 2008 enabled autovacuum by default. So these were the times when autovacuum was still somewhat experimental, and many people still used cron jobs to schedule regular VACUUM runs. These were the bad old days indeed.

Conclusion

In PostgreSQL, both the source code and the discussions that lead the developers to a certain decision are there for everyone to see. I demonstrated this by uncovering the reasoning behind the current default value for autovacuum_vacuum_scale_factor. With open source software like PostgreSQL, you are not reduced to being a user. If you feel that 18 years of experience and the changing requirements are enough to suggest a change, join the hackers mailing list and make your point. Welcome to the community!

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2025
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram