CYBERTEC PostgreSQL Logo

Making the PostgreSQL visibility map visible

06.2024 / Category: / Tags: |
visibility_map
© Laurenz Albe 2024

Enough has been written about the connection between VACUUM, the visibility map and index-only scans in PostgreSQL. But recently I demonstrated index-only scans to a class I was teaching and made some observations that surprised me at first. I could eventually explain what I saw, but I may have given the class more to chew on than I intended. I myself found the experience enlightening, and if you would like some insight into the implementation details and performance features of PostgreSQL, welcome to the show.

The visibility map

Every PostgreSQL table has a visibility map that stores two bits for each of the table’s 8kB pages: the all-frozen flag, which allows anti-wraparound autovacuum to skip some pages for better performance, and the all-visible flag, which will be our focus in this article. If the all-visible bit for a page is set, PostgreSQL knows that all tuples in that page are visible to all transactions. It can then skip fetching the table row if the sole purpose is to check it for visibility, which leads to index-only scans that actually deserve that name. Since VACUUM removes dead tuples, which renders table pages all-visible, it also has the job of maintaining the visibility map. Consequently, you need to make sure that a table is VACUUMed often enough if you want efficient index-only scans.

An example to show the visibility map at play

All we need is a bigger table with an index (in this case, the primary key index). I use an unlogged table to avoid writing WAL, which won’t affect our observations.

We VACUUM the table to set the hint bits and build the visibility map. We then ANALYZE it to get good optimizer statistics.

Using EXPLAIN (ANALYZE) to check the quality of the index scan

Among many other valuable dataEXPLAIN (ANALYZE) shows us how many table tuples had to be fetched to check them for visibility, because the page’s all-visible flag wasn’t set:

We see that PostgreSQL didn’t have to fetch a single table row, since all pages were all-visible.

Data modifications reset the all-visible flag

If we update a single row in the table, PostgreSQL has to reset the all-visible flag in the visibility map for two table pages:

  • the page that contains the original row version, because that tuple becomes invisible
  • the page that contains the updated row version, because concurrent transactions cannot see it before the transaction commits

Here is a demonstration:

The row with id = 7 was within the first page of the table, and that page must contain 185 table rows (one heap fetch is for the updated row version, which is on a different page). Since the page is no longer all-visible, PostgreSQL has to fetch all tuples in it and test if they are visible to the transaction.

Which row is in which table page?

It is good to have an explanation of what happened, but it is better to verify that our line of thinking is correct. We can verify our deductions by looking at the current tuple-ID of the table rows:

The first number of the ctid is the page number. So we indeed have 185 tuples in the first page (page number 0): 184 visible ones and an invisible (updated) one. Page 5405 is also no longer all-visible, so scanning the index for the row with id = 7 causes another heap fetch, and we come up with a total of 186 heap fetches.

Visualizing the visibility map

So far, we have only inferred that the UPDATE we executed above reset the all-visible flag on two pages, but we can also prove that the deduction was correct. Using the standard extension pg_visibility, we can examine the visibility map directly:

This confirms that all but the first and the last page of the table are all-visible.

The mysterious vanishing tuple

So far, everything was as I expected. My first surprise came when I repeated EXPLAIN (ANALYZE):

Suddenly, there is one heap fetch less. What happened? Did autovacuum run on the table? That cannot be, because autovacuum doesn’t start running before dead tuples make up 20% of the table. We only updated a single row!

Fortunately I remembered something I blogged about a while ago: killed index tuples. If an index scan detects that a table row is no longer visible to any transaction, it marks the index entry LP_DEAD so that the next index scan knows it can ignore the entry. Our first EXPLAIN (ANALYZE) must have killed the index entry pointing to the updated row version, so the second one could avoid one heap fetch.

VACUUM doesn’t update the visibility map as it should

Next, I wanted to demonstrate that VACUUM removes dead tuples, which makes pages all-visible, and updates the visibility map:

Huh? Why do we still have 184 heap fetches, only one less than before?

The explanation for VACUUM‘s failure to set all pages all-visible

I had to think a while before I remembered an optimization that went into PostgreSQL v14 with commit 5100010ee4:

Since we updated only a single row, only one of the 5406 pages of the table contain a dead row, which is clearly less than 2%. So VACUUM skipped the expensive step of cleaning up the index. If VACUUM does not clean up the index and does not remove the index entry that points to the dead row version, it cannot remove the dead row version itself, and the first table page cannot become all-visible.

However, when VACUUM scans the last table page that contains the updated row, it notices that all entries in that page are visible to everybody and sets the all-visible flage for that page. As a consequence, the table row with id = 7 is now on an all-visible page, and PostgreSQL can skip the heap fetch for that one row.

Forcing VACUUM to clean up the index

If we desparately need to get the heap fetches down to zero again, we can use an option of VACUUM that was added in commit 3499df0dee:

Using INDEX_CLEANUP ON, we can make VACUUM clean up the index, no matter what:

It is also possible to configure the table so that autovacuum always cleans up the index:

Conclusion

We had a thorough look at the PostgreSQL visibility map and how VACUUM keeps it up to date. An experiment showed that from v14 on, a performance optimization in PostgreSQL sometimes keeps VACUUM from making some pages all-visible. We also got to see the pg_visibility extension, which allows us to examine the visibility map directly.

If you’d like to read more about how the visibility map shows up in EXPLAIN (ANALYZE), check out Ants’ article about performance downsides of UUIDs.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram