CYBERTEC PostgreSQL Logo

Making the PostgreSQL visibility map visible

06.2024 / Category: / Tags: |
visibility_map
© Laurenz Albe 2024

Enough has been written about the connection between VACUUM, the visibility map and index-only scans in PostgreSQL. But recently I demonstrated index-only scans to a class I was teaching and made some observations that surprised me at first. I could eventually explain what I saw, but I may have given the class more to chew on than I intended. I myself found the experience enlightening, and if you would like some insight into the implementation details and performance features of PostgreSQL, welcome to the show.

The visibility map

Every PostgreSQL table has a visibility map that stores two bits for each of the table’s 8kB pages: the all-frozen flag, which allows anti-wraparound autovacuum to skip some pages for better performance, and the all-visible flag, which will be our focus in this article. If the all-visible bit for a page is set, PostgreSQL knows that all tuples in that page are visible to all transactions. It can then skip fetching the table row if the sole purpose is to check it for visibility, which leads to index-only scans that actually deserve that name. Since VACUUM removes dead tuples, which renders table pages all-visible, it also has the job of maintaining the visibility map. Consequently, you need to make sure that a table is VACUUMed often enough if you want efficient index-only scans.

An example to show the visibility map at play

All we need is a bigger table with an index (in this case, the primary key index). I use an unlogged table to avoid writing WAL, which won’t affect our observations.

We VACUUM the table to set the hint bits and build the visibility map. We then ANALYZE it to get good optimizer statistics.

Using EXPLAIN (ANALYZE) to check the quality of the index scan

Among many other valuable dataEXPLAIN (ANALYZE) shows us how many table tuples had to be fetched to check them for visibility, because the page’s all-visible flag wasn’t set:

We see that PostgreSQL didn’t have to fetch a single table row, since all pages were all-visible.

Data modifications reset the all-visible flag

If we update a single row in the table, PostgreSQL has to reset the all-visible flag in the visibility map for two table pages:

  • the page that contains the original row version, because that tuple becomes invisible
  • the page that contains the updated row version, because concurrent transactions cannot see it before the transaction commits

Here is a demonstration:

The row with id = 7 was within the first page of the table, and that page must contain 185 table rows (one heap fetch is for the updated row version, which is on a different page). Since the page is no longer all-visible, PostgreSQL has to fetch all tuples in it and test if they are visible to the transaction.

Which row is in which table page?

It is good to have an explanation of what happened, but it is better to verify that our line of thinking is correct. We can verify our deductions by looking at the current tuple-ID of the table rows:

The first number of the ctid is the page number. So we indeed have 185 tuples in the first page (page number 0): 184 visible ones and an invisible (updated) one. Page 5405 is also no longer all-visible, so scanning the index for the row with id = 7 causes another heap fetch, and we come up with a total of 186 heap fetches.

Visualizing the visibility map

So far, we have only inferred that the UPDATE we executed above reset the all-visible flag on two pages, but we can also prove that the deduction was correct. Using the standard extension pg_visibility, we can examine the visibility map directly:

This confirms that all but the first and the last page of the table are all-visible.

The mysterious vanishing tuple

So far, everything was as I expected. My first surprise came when I repeated EXPLAIN (ANALYZE):

Suddenly, there is one heap fetch less. What happened? Did autovacuum run on the table? That cannot be, because autovacuum doesn’t start running before dead tuples make up 20% of the table. We only updated a single row!

Fortunately I remembered something I blogged about a while ago: killed index tuples. If an index scan detects that a table row is no longer visible to any transaction, it marks the index entry LP_DEAD so that the next index scan knows it can ignore the entry. Our first EXPLAIN (ANALYZE) must have killed the index entry pointing to the updated row version, so the second one could avoid one heap fetch.

VACUUM doesn’t update the visibility map as it should

Next, I wanted to demonstrate that VACUUM removes dead tuples, which makes pages all-visible, and updates the visibility map:

Huh? Why do we still have 184 heap fetches, only one less than before?

The explanation for VACUUM‘s failure to set all pages all-visible

I had to think a while before I remembered an optimization that went into PostgreSQL v14 with commit 5100010ee4:

Since we updated only a single row, only one of the 5406 pages of the table contain a dead row, which is clearly less than 2%. So VACUUM skipped the expensive step of cleaning up the index. If VACUUM does not clean up the index and does not remove the index entry that points to the dead row version, it cannot remove the dead row version itself, and the first table page cannot become all-visible.

However, when VACUUM scans the last table page that contains the updated row, it notices that all entries in that page are visible to everybody and sets the all-visible flage for that page. As a consequence, the table row with id = 7 is now on an all-visible page, and PostgreSQL can skip the heap fetch for that one row.

Forcing VACUUM to clean up the index

If we desparately need to get the heap fetches down to zero again, we can use an option of VACUUM that was added in commit 3499df0dee:

Using INDEX_CLEANUP ON, we can make VACUUM clean up the index, no matter what:

It is also possible to configure the table so that autovacuum always cleans up the index:

Conclusion

We had a thorough look at the PostgreSQL visibility map and how VACUUM keeps it up to date. An experiment showed that from v14 on, a performance optimization in PostgreSQL sometimes keeps VACUUM from making some pages all-visible. We also got to see the pg_visibility extension, which allows us to examine the visibility map directly.

If you’d like to read more about how the visibility map shows up in EXPLAIN (ANALYZE), check out Ants’ article about performance downsides of UUIDs.

6 responses to “Making the PostgreSQL visibility map visible”

  1. Hello,

    Very interesting article, I am wondering if it is related to my problem. I have a very complex query, and the critical part of the execution plan can vary greatly between runs, as you can see here.

    ->Parallel Index Only Scan using zie5tikondat on tikondat(cost=0.70..1229.69 rows=13080 width=27) (actual time=0.110..6.950 rows=9972 loops=3)

    ->Index Only Scan using zie5tikondat on tikondat(cost=0.70..140.39 rows=2673 width=27) (actual time=0.053..16.115 rows=29915 loops=3)

    ->Index Only Scan using zie5tikondat on tikondat(cost=0.70..252.71 rows=2254 width=27) (actual time=0.049..83422.021 rows=261612559 loops=3)

    The table is very large, and changes are being made throughout the day. But I can't get my head around the last scan where it returned 261612559 rows, where only 29915 actually match the condition. This scan also had a large number of heap fetchs.

    I am thinking of using the INDEX_CLEANUP ON option. What do you think?

    Regards
    Richard Hennessy

    • From that little bit of information, it is hard to say what is going on. There are several possibilities:

      - The table statistics are outdated. In that case, ANALYZE tikondat should improve the estimate.

      - The table statistics are not detailed enough. In that case, ALTER tikondat ALTER the_column SET STATISTICS 500 followed by ANALYZE tikondat might fix the problem.

      Note that in the last example, there were really 261612559 rows returned from the index scan, not 29915. Perhaps other filters later in the execution plan got rid of many of them.

      I recommend using EXPLAIN (ANALYZE, BUFFERS) so that you get the relevant buffer statistics as well. Also, enabling track_io_timing can provide valuable information in the EXPLAIN output.

  2. Thank your for your reply.

    ANALYZE is run regulary, both manually and automatically. I did try settings statistics to 1000, though to be honest that was earlier on in the analysis, and I cannot remember if that large index scan was present then. I will try it again.

    The table has around 282000000 rows, I have no idea how the figure 261612599 came about, let alone why. The rows will have been filtered out later through various joins.

    I have been using EXPLAIN (ANALYZE, BUFFERS) all the way though here, but I'll also try track_io_timing to see if it sheds more light on this issue.

    Thanks again.

  3. I have few doubts

    1. Why is the heap fetch count 186 just after updating the row? What I am unable to understand is that on the page 0 you mentioned it had to scan 185 tuples and then on page 5405 it had to scan one additional tuple, but why only 1 tuple on page 5405? why not all of them again as the all-visible flag for that page is also off.

    2. Does index also contain multiple entries for same row if we update the row?

    3. How does Query Processor know which pages from heap to scan for?

    4. Let's assume on index we found an entry which says id=7, why does it need to check visibility map? And how does it check exactly (what happens when it sees all visible page)?

    5. Let's assume few more rows were updated along with id=7, then when QP is looking for pages on heap does it have to go through all partially visible pages even if the tuple we are interested in is not on that page?

    • 1. PostgreSQL only has to read a single tuple from page 5405: the updated version of the row. Scanning any other index entry that points to that page would also cause a heap fetch, but we have to read none of them.

      2. Yes.

      3. Because the page number is part of the physical address of the tuple (the ctid), which is the pointer to the table row stored with the index leaf entry.

      4. Because the table row that belongs to this index entry might not be visible in the query's snapshot. But if the page containing the tuple is all-visible in the visibility map we are done, because that means that all tuples in that page are visible to all transactions.

      5. No, because (as explained above) the physical address of the tuple, and hence the page number, is stored in the index entry.

  4. Thanks for answering my questions. However, I have few more questions.

    1. If the page entry is stored in the index entry then initially dead row (for id=7) should say the page=0. Now if we know page=0 and it's not all-visible, does it require us to scan all the tuples inside that page?
    2. Assuming it requires scanning all tuples on page 0, then it figures out the required tuple is not present on the page, and then it finds a new index entry that says page=5405, this page also doesn't have the all-visible flag set to true, then how does it find the required tuple in just one index scan instead of going through all tuples?
    3. I read LP_DEAD is not propagated to replica instances, which means they would end up going through all heap fetches on each query. What is the recommendation to reduce the heap fetches on replica instances? Any other way apart from having aggressive vacuum / analyze? Also, does table partitioning help with this situation anyhow?

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2025
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram