CYBERTEC PostgreSQL Logo

B-tree index improvements in PostgreSQL v12

11.2019 / Category: , / Tags: |
After all, a B-tree index is just a tree
© Laurenz Albe 2019

 

If you thought that the B-tree index is a technology that was perfected in the 1980s, you are mostly right. But there is still room for improvement, so PostgreSQL v12 (in the tradition of v11) has added some new features in this field. Thanks, Peter Geoghegan!

In this article, I want to explain some of these improvements with examples.

A B-tree index test case

To demonstrate the changes, I'll create an example table on both PostgreSQL v11 and v12:

Tables like this are typically used to implement many-to-many relationships between other tables (entities).

The primary key creates a unique composite B-tree index on the table. The table serves two purposes:

  • it ensures that there is at most one entry for each combination of aid and bid
  • it can speed up searches for all bids related to a given aid

The second index speeds up searches for all aids related to a given bid.

Adding test data

Now let's insert some rows in ascending numeric order, as is common for artificially generated keys:

Here each bid is related to 10000 aids.

B-tree index improvement 1: INSERT into indexes with many duplicates

The first difference becomes obvious when we compare the size of the index on bid:

In PostgreSQL v11:

In PostgreSQL v12:

The index is 33% bigger in v11.

Explanation

Every bid occurs 10000 times in the index, therefore there will be many leaf pages where all keys are the same (each leaf page can contain a couple of hundred entries).

 

Before v12

Before v12, PostgreSQL would store such entries in no special order in the index. So if a leaf page had to be split, it was sometimes the rightmost leaf page, but sometimes not. The rightmost leaf page was always split towards the right end to optimize for monotonically increasing inserts. In contrast to this, other leaf pages were split in the middle, which wasted space.

Since v12

Starting with v12, the physical address (“tuple ID” or TID) of the table row is part of the index key, so duplicate index entries are stored in table order. This will cause index scans for such entries to access the table in physical order, which can be a significant performance benefit, particularly on spinning disks. In other words, the correlation for duplicate index entries will be perfect. Moreover, pages that consist only of duplicates will be split at the right end, resulting in a densely packed index (that is what we observed above).

A similar optimization was added for multi-column indexes, but it does not apply to our primary key index, because the duplicates are not in the first column. The primary key index is densely packed in both v11 and v12, because the first column is monotonically increasing, so leaf page splits occur always at the rightmost page. As mentioned above, PostgreSQL already had an optimization for that.

B-tree index improvement 2: compressed storage of internal index pages

The improvements for the primary key index are not as obvious, since they are almost equal in size in v11 and v12. We will have to dig deeper here.

First, observe the small difference in an index-only scan in both v11 and v12 (repeat the statement until the blocks are in cache):

In PostgreSQL v11:

In PostgreSQL v12:

In v12, one less (index) block is read, which means that the index has one level less. Since the size of the indexes is almost identical, that must mean that the internal pages can fit more index entries. In v12, the index has a greater fan-out.

Explanation

As described above, PostgreSQL v12 introduced the TID as part of the index key, which would waste an inordinate amount of space in internal index pages. So the same commit introduced truncation of “redundant” index attributes from internal pages. The TID is redundant, as are non-key attributes from an INCLUDE clause (these were also removed from internal index pages in v11). But PostgreSQL v12 can also truncate those index attributes that are not needed for table row identification.

In our primary key index, bid is a redundant column and is truncated from internal index pages, which saves 8 bytes of space per index entry. Let's have a look at an internal index page with the pageinspect extension:

In PostgreSQL v11:

In the data entry we see the bytes from aid and bid. The experiment was conducted on a little-endian machine, so the numbers in row 6 would be 0x09C3E5 and 0x3F or (as decimal numbers) 639973 and 63. Each index entry is 24 bytes wide, of which 8 bytes are the tuple header.

In PostgreSQL v12:

The data contain only aid, since bid has been truncated away. This reduces the index entry size to 16, so that more entries fit on an index page.

Upgrade considerations

Since index storage has been changed in v12, a new B-tree index version 4 has been introduced.

Since upgrading with pg_upgrade does not change the data files, indexes will still be in version 3 after an upgrade. PostgreSQL v12 can use these indexes, but the above optimizations will not be available. You need to REINDEX an index to upgrade it to version 4 (this has been made easier with REINDEX CONCURRENTLY in PostgreSQL v12).

Other B-tree index features introduced in v12

There were some other improvements added in PostgreSQL v12. I won't discuss them in detail, but here is a list:

  • Reduce locking overhead for B-tree index inserts for improved performance.
  • Introduce REINDEX CONCURRENTLY to make it easier to rebuild an index without down-time.
  • Improve performance for index-only scans on indexes with many attributes.
  • Add a view pg_stat_progress_create_index to report progress for CREATE INDEX and REINDEX.

Conclusion

B-tree indexes have become smarter again in PostgreSQL v12, particularly for indexes with many duplicate entries. To fully benefit, you either have to upgrade with dump/restore or you have to REINDEX indexes after pg_upgrade.

6 responses to “B-tree index improvements in PostgreSQL v12”

  1. Hi,

    Do you have any articles on Hash Index. I only need to do read/select on this index , I read that hash is the way to go.

    B Tree is useful if you need it for things like a LIKE statement.

    • Think about what they are and you will see you were mislead.

      A BTree is an ordered tree. A hash is a mapping from one set to another one.
      An example of hash: "iterative sum of digits".
      So, if I want to hash 179 -> 17 -> 8
      Now, I only need 10 buckets for my index.
      Do I have "179" in my table ?
      -> Check the buckets 8, then iterate the list of numbers there.

      Now, if you want to order by number, your hash will not help you! Indeed, 17 and 161 occupies the same buckets ! 18 and 162 occupies the next buckets !

      With BTree that's different, you can go through the tree in prefix order and you get a sorting, in postfix order and you get the reverse sorting.

      Need something between two values ? Right in the BTree alley. For a node, everything on the left is smaller (or greater) and everything on the right is greater (or smaller).

      The only advantage of Hash is access speed. But that depends on collision.

      Create twice the same table and use each index. Try sorting and ">" type of queries.

      Bonus: seek Bitmap index!
      Bonus2: bloom filters.

    • There have been no improvements for hash indexes in PostgreSQL v12.
      I think that hardly anybody uses them.
      The problem is that they were not really usable before v10, and they have not received a lot of love and optimization, unlike B-tree indexes.
      Another problem that harms their adoption is that a B-tree index can do everything that a hash index can (and more), except that the hash index *might* be faster in some cases.

      If you only have conditions like WHERE x = 42, you could consider using a hash index. But you should run realistic tests to see if you can measure an improvement over B-tree indexes.

  2. Hello Laurenz. Thanks for this post !
    In the above lists, is ctid equivalent to TID ?
    In V12 if the TID is part of the key, itemlen 16 accounts for TID aid
    In V11 itemlen is 24 for aid bid: it should rather be 16 ?

    • Yes, ctid is the tuple identifier. I don't know whet the “c” stands for.
      An index entry in v11 also contains the TID (that's there where the index entry points), it just isn't counted as part of the key.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2025
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram