Table of Contents
PostgreSQL v12 brought more efficient storage for indexes, and v13 improved that even more by adding deduplication of index entries. But Peter Geoghegan wasn't finished! PostgreSQL v14 added “bottom-up” index entry deletion, which targets reducing unnecessary page splits, index bloat and fragmentation of heavily updated indexes.
In a B-tree index, there is an index entry for every row version (“tuple”) in the table that is not dead (invisible to everybody). When VACUUM
removes dead tuples, it also has to delete the corresponding index entries. Just like with tables, that creates empty space in an index page. Such space can be reused, but if no new entries are added to the page, the space remains empty.
This “bloat” is unavoidable and normal to some extent, but if it gets to be too much, the index will become less efficient:
This is particularly likely to happen if you update the same row frequently. Until VACUUM
can clean up old tuples, the table and the index will contain many versions of the same row. This is particularly unpleasant if an index page fills up: then PostgreSQL will “split” the index page in two. This is an expensive operation, and after VACUUM
is done cleaning up, we end up with two bloated pages instead of a single one.
The creation of HOT tuples is perhaps the strongest weapon PostgreSQL has to combat unnecessary churn in the index. With this feature, an UPDATE
creates tuples that are not referenced from an index, but only from the previous version of the table row. That way, there is no need to write a new index entry at all, which is good for performance and completely avoids index bloat.
Read more in my article about HOT updates.
When an index scan encounters an entry that points to a dead tuple in the table, it will mark the index entry as “killed”. Subsequent index scans will skip such entries even before VACUUM
can remove them. Moreover, PostgreSQL can delete such entries when the index page is full, to avoid a page split.
See my article on killed index tuples for details.
“Bottom-up index tuple deletion” goes farther than the previous approaches: it deletes index entries that point to dead tuples right before an index page split is about to occur. This can reduce the number of index entries and avoid the expensive page split, together with the bloat that will occur later, when VACUUM
cleans up.
In a way, this performs part of the work of VACUUM
earlier, at a point where it is useful to avoid index bloat.
To demonstrate the effects of the new feature, I performed a custom pgbench
run on PostgreSQL v13 and v14.
This is the table for the test:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
CREATE TABLE testtab ( id bigint CONSTRAINT testtab_pkey PRIMARY KEY, unchanged integer, changed integer ); INSERT INTO testtab SELECT i, i, 0 FROM generate_series(1, 10000) AS i; CREATE INDEX testtab_unchanged_idx ON testtab (unchanged); CREATE INDEX testtab_changed_idx ON testtab (changed); |
This is the pgbench
script called “bench.sql
”:
1 2 3 4 5 6 7 8 9 10 11 |
set id random_gaussian(1, 10000, 10) UPDATE testtab SET changed = changed + 1 WHERE id = :id; UPDATE testtab SET changed = changed + 1 WHERE id = :id; UPDATE testtab SET changed = changed + 1 WHERE id = :id; UPDATE testtab SET changed = changed + 1 WHERE id = :id; UPDATE testtab SET changed = changed + 1 WHERE id = :id; UPDATE testtab SET changed = changed + 1 WHERE id = :id; UPDATE testtab SET changed = changed + 1 WHERE id = :id; UPDATE testtab SET changed = changed + 1 WHERE id = :id; UPDATE testtab SET changed = changed + 1 WHERE id = :id; UPDATE testtab SET changed = changed + 1 WHERE id = :id; |
I chose a normal distribution, because in real life there are usually some (recent?) table rows that receive more updates than others. The row is updated ten times, in order to make it more likely that the affected page will have to be split.
I run the script 60000 times (10000 iterations by 6 clients) as follows:
1 |
pgbench -n -c 6 -f bench.sql -t 10000 test |
We use the pgstattuple
extension to get index statistics with psql:
1 2 3 4 5 6 |
SELECT i.indexrelid::regclass AS index, s.index_size, s.avg_leaf_density FROM pg_index AS i CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s WHERE indrelid = 'testtab'::regclass; |
This is what we get for v13:
1 2 3 4 5 6 |
index │ index_size │ avg_leaf_density ═══════════════════════╪════════════╪══════════════════ testtab_pkey │ 319488 │ 66.6 testtab_unchanged_idx │ 4022272 │ 5.33 testtab_changed_idx │ 4505600 │ 13.57 (3 rows) |
For v14, the result is:
1 2 3 4 5 6 |
index │ index_size │ avg_leaf_density ═══════════════════════╪════════════╪══════════════════ testtab_pkey │ 245760 │ 87.91 testtab_unchanged_idx │ 532480 │ 39.23 testtab_changed_idx │ 4038656 │ 14.23 (3 rows) |
We see the biggest improvement in testtab_unchanged_idx
. In v13, the index is bloated out of shape, while in v14 it only has 60% bloat (which is not bad for an index). Here we see the biggest effect of the new feature. The UPDATE
doesn't scan that index, so there are no killed index tuples, and still “bottom-up deletion” could remove enough of them to avoid a page split in most cases.
There is also a measurable improvement in testtab_pkey
. Since the UPDATE
scans that index, dead index tuples will be killed, and the new feature removes those before splitting the page. The difference to v13 is less pronounced here, since v13 already avoids index bloat quite well.
The index testtab_changed_idx
cannot benefit from the new feature, since that only addresses the case where the UPDATE
doesn't modify the indexed value. In case you wonder why the leaf density is so much lower compared to testtab_unchanged_idx
in v13: that is index de-duplication, which can kick in because the index entry is not modified.
pg_upgrade
?The storage format of the index is unchanged, so this will automatically work after a pg_upgrade
of an index created on PostgreSQL v12 or later. If the index was created with an earlier version of PostgreSQL, you will have to REINDEX
the index to benefit from the new feature. Remember that pg_upgrade
simply copies the index files and does not update the internal index version.
PostgreSQL v14 continues to bring improvements to B-tree indexes. While this particular one may not be revolutionary, it promises to provide a solid improvement for many workloads, especially those with lots of updates.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply