Enough has been written about the connection between VACUUM
, the visibility map and index-only scans in PostgreSQL. But recently I demonstrated index-only scans to a class I was teaching and made some observations that surprised me at first. I could eventually explain what I saw, but I may have given the class more to chew on than I intended. I myself found the experience enlightening, and if you would like some insight into the implementation details and performance features of PostgreSQL, welcome to the show.
Table of Contents
Every PostgreSQL table has a visibility map that stores two bits for each of the table’s 8kB pages: the all-frozen flag, which allows anti-wraparound autovacuum to skip some pages for better performance, and the all-visible flag, which will be our focus in this article. If the all-visible bit for a page is set, PostgreSQL knows that all tuples in that page are visible to all transactions. It can then skip fetching the table row if the sole purpose is to check it for visibility, which leads to index-only scans that actually deserve that name. Since VACUUM
removes dead tuples, which renders table pages all-visible, it also has the job of maintaining the visibility map. Consequently, you need to make sure that a table is VACUUM
ed often enough if you want efficient index-only scans.
All we need is a bigger table with an index (in this case, the primary key index). I use an unlogged table to avoid writing WAL, which won’t affect our observations.
1 2 3 4 5 6 7 |
CREATE UNLOGGED TABLE vis ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, val double precision NOT NULL ); INSERT INTO vis (val) SELECT random() FROM generate_series(1, 1000000); |
We VACUUM
the table to set the hint bits and build the visibility map. We then ANALYZE
it to get good optimizer statistics.
1 |
VACUUM (ANALYZE) vis; |
EXPLAIN (ANALYZE)
to check the quality of the index scanAmong many other valuable data, EXPLAIN (ANALYZE)
shows us how many table tuples had to be fetched to check them for visibility, because the page’s all-visible flag wasn’t set:
1 2 3 4 5 6 7 8 9 10 |
EXPLAIN (ANALYZE, COSTS OFF) SELECT id FROM vis WHERE id < 1000; QUERY PLAN ═══════════════════════════════════════════════════════════════════════════════════ Index Only Scan using vis_pkey on vis (actual time=0.083..0.522 rows=999 loops=1) Index Cond: (id < 1000) Heap Fetches: 0 Planning Time: 0.283 ms Execution Time: 0.713 ms |
We see that PostgreSQL didn’t have to fetch a single table row, since all pages were all-visible.
If we update a single row in the table, PostgreSQL has to reset the all-visible flag in the visibility map for two table pages:
Here is a demonstration:
1 2 3 4 5 6 7 8 9 10 11 12 |
UPDATE vis SET val = 0.0 WHERE id = 7; EXPLAIN (ANALYZE, COSTS OFF) SELECT id FROM vis WHERE id < 1000; QUERY PLAN ═══════════════════════════════════════════════════════════════════════════════════ Index Only Scan using vis_pkey on vis (actual time=0.117..0.507 rows=999 loops=1) Index Cond: (id < 1000) Heap Fetches: 186 Planning Time: 0.255 ms Execution Time: 0.660 ms |
The row with id = 7
was within the first page of the table, and that page must contain 185 table rows (one heap fetch is for the updated row version, which is on a different page). Since the page is no longer all-visible, PostgreSQL has to fetch all tuples in it and test if they are visible to the transaction.
It is good to have an explanation of what happened, but it is better to verify that our line of thinking is correct. We can verify our deductions by looking at the current tuple-ID of the table rows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
SELECT ctid, * FROM vis; ctid │ id │ val ════════════╪═════════╪════════════════════════ (0,1) │ 1 │ 0.9428522231691654 (0,2) │ 2 │ 0.8280668280240011 (0,3) │ 3 │ 0.7906647703465222 (0,4) │ 4 │ 0.19579683924366442 (0,5) │ 5 │ 0.5798129594075341 (0,6) │ 6 │ 0.5612578971780957 (0,8) │ 8 │ 0.29021282947733007 (0,9) │ 9 │ 0.2606336534585936 (0,10) │ 10 │ 0.7574579143204805 ... (0,184) │ 184 │ 0.38348399704071734 (0,185) │ 185 │ 0.08413532836604021 (1,1) │ 186 │ 0.9946325078631779 (1,2) │ 187 │ 0.694070825069186 ... (5405,74) │ 999999 │ 0.8119554439812944 (5405,75) │ 1000000 │ 0.3859512822842295 (5405,76) │ 7 │ 0 (1000000 rows) |
The first number of the ctid
is the page number. So we indeed have 185 tuples in the first page (page number 0): 184 visible ones and an invisible (updated) one. Page 5405 is also no longer all-visible, so scanning the index for the row with id = 7
causes another heap fetch, and we come up with a total of 186 heap fetches.
So far, we have only inferred that the UPDATE
we executed above reset the all-visible flag on two pages, but we can also prove that the deduction was correct. Using the standard extension pg_visibility, we can examine the visibility map directly:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
CREATE EXTENSION pg_visibility; SELECT * FROM pg_visibility_map('vis'); blkno │ all_visible │ all_frozen ═══════╪═════════════╪════════════ 0 │ f │ f 1 │ t │ f 2 │ t │ f 3 │ t │ f ... 5401 │ t │ f 5402 │ t │ f 5403 │ t │ f 5404 │ t │ f 5405 │ f │ f (5406 rows) |
This confirms that all but the first and the last page of the table are all-visible.
So far, everything was as I expected. My first surprise came when I repeated EXPLAIN (ANALYZE)
:
1 2 3 4 5 6 7 8 9 10 |
EXPLAIN (ANALYZE, COSTS OFF) SELECT id FROM vis WHERE id < 1000; QUERY PLAN ═══════════════════════════════════════════════════════════════════════════════════ Index Only Scan using vis_pkey on vis (actual time=0.085..0.742 rows=999 loops=1) Index Cond: (id < 1000) Heap Fetches: 185 Planning Time: 0.293 ms Execution Time: 0.966 ms |
Suddenly, there is one heap fetch less. What happened? Did autovacuum run on the table? That cannot be, because autovacuum doesn’t start running before dead tuples make up 20% of the table. We only updated a single row!
Fortunately I remembered something I blogged about a while ago: killed index tuples. If an index scan detects that a table row is no longer visible to any transaction, it marks the index entry LP_DEAD
so that the next index scan knows it can ignore the entry. Our first EXPLAIN (ANALYZE)
must have killed the index entry pointing to the updated row version, so the second one could avoid one heap fetch.
VACUUM
doesn’t update the visibility map as it shouldNext, I wanted to demonstrate that VACUUM
removes dead tuples, which makes pages all-visible, and updates the visibility map:
1 2 3 4 5 6 7 8 9 10 11 12 |
VACUUM vis; EXPLAIN (ANALYZE, COSTS OFF) SELECT id FROM vis WHERE id < 1000; QUERY PLAN ═══════════════════════════════════════════════════════════════════════════════════ Index Only Scan using vis_pkey on vis (actual time=0.125..1.097 rows=999 loops=1) Index Cond: (id < 1000) Heap Fetches: 184 Planning Time: 1.931 ms Execution Time: 1.535 ms |
Huh? Why do we still have 184 heap fetches, only one less than before?
VACUUM
‘s failure to set all pages all-visibleI had to think a while before I remembered an optimization that went into PostgreSQL v14 with commit 5100010ee4:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
Author: Peter Geoghegan <pg@bowt.ie> Date: Wed Apr 7 16:14:54 2021 -0700 Teach VACUUM to bypass unnecessary index vacuuming. [...] VACUUM has generally managed to avoid index scans when there were clearly no index tuples to delete from indexes. But cases with _close to_ no index tuples to delete were another matter -- a round of ambulkdelete() calls took place (one per index), each of which performed a full index scan. VACUUM now behaves just as if there were zero index tuples to delete in cases where there are in fact "virtually zero" such tuples. That is, it can now bypass index vacuuming and heap vacuuming as an optimization (though not index cleanup). Whether or not VACUUM bypasses indexes is determined dynamically, based on the just-observed number of heap pages in the table that have one or more LP_DEAD items (LP_DEAD items in heap pages have a 1:1 correspondence with index tuples that still need to be deleted from each index in the worst case). We only skip index vacuuming when 2% or less of the table's pages have one or more LP_DEAD items -- bypassing index vacuuming as an optimization must not noticeably impede setting bits in the visibility map. [...] |
Since we updated only a single row, only one of the 5406 pages of the table contain a dead row, which is clearly less than 2%. So VACUUM
skipped the expensive step of cleaning up the index. If VACUUM
does not clean up the index and does not remove the index entry that points to the dead row version, it cannot remove the dead row version itself, and the first table page cannot become all-visible.
However, when VACUUM
scans the last table page that contains the updated row, it notices that all entries in that page are visible to everybody and sets the all-visible flage for that page. As a consequence, the table row with id = 7
is now on an all-visible page, and PostgreSQL can skip the heap fetch for that one row.
VACUUM
to clean up the indexIf we desparately need to get the heap fetches down to zero again, we can use an option of VACUUM
that was added in commit 3499df0dee:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Author: Peter Geoghegan <pg@bowt.ie> Date: Fri Jun 18 20:04:07 2021 -0700 Support disabling index bypassing by VACUUM. Generalize the INDEX_CLEANUP VACUUM parameter (and the corresponding reloption): make it into a ternary style boolean parameter. It now exposes a third option, "auto". The "auto" option (which is now the default) enables the "bypass index vacuuming" optimization added by commit 1e55e7d1. "VACUUM (INDEX_CLEANUP TRUE)" is redefined to once again make VACUUM simply do any required index vacuuming, regardless of how few dead tuples are encountered during the first scan of the target heap relation (unless there are exactly zero). This gives users a way of opting out of the "bypass index vacuuming" optimization, if for whatever reason that proves necessary. [...] |
Using INDEX_CLEANUP ON, we can make VACUUM clean up the index, no matter what:
1 2 3 4 5 6 7 8 9 10 11 12 |
VACUUM (INDEX_CLEANUP ON) vis; EXPLAIN (ANALYZE, COSTS OFF) SELECT id FROM vis WHERE id < 1000; QUERY PLAN ═══════════════════════════════════════════════════════════════════════════════════ Index Only Scan using vis_pkey on vis (actual time=0.084..0.482 rows=999 loops=1) Index Cond: (id < 1000) Heap Fetches: 0 Planning Time: 0.585 ms Execution Time: 0.678 ms |
It is also possible to configure the table so that autovacuum always cleans up the index:
1 |
ALTER TABLE vis SET (vacuum_index_cleanup = on); |
We had a thorough look at the PostgreSQL visibility map and how VACUUM
keeps it up to date. An experiment showed that from v14 on, a performance optimization in PostgreSQL sometimes keeps VACUUM
from making some pages all-visible. We also got to see the pg_visibility extension, which allows us to examine the visibility map directly.
If you’d like to read more about how the visibility map shows up in EXPLAIN (ANALYZE)
, check out Ants’ article about performance downsides of UUIDs.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply