CYBERTEC PostgreSQL Logo

Killed index tuples

11.2018 / Category: / Tags: | |
1000 killed index tuples
 © Laurenz Albe 2018

 

Since I only recently learned about the concept of “killed index tuples”, I thought there might be some others who are not yet familiar with this interesting PostgreSQL concept.

This may give you an explanation the next time you encounter wildly varying execution times for the same execution plan of the same PostgreSQL query.

Before we look more closely at the index, let's review the life cycle of a table row version (“heap tuple”).

Life, death and visibility in the table heap

It is widely known that the visibility of heap tuples is determined by the system columns xmin and xmax (though there is more to xmax than meets the eye). A heap tuple is “dead” if its xmax is less than the xmin of all active transactions.

Now xmin and xmax are only valid if the respective transactions have been marked committed in the “commit log”. Consequently, any transaction that needs to know if it can see a tuple has to consult the commit log. To save future readers that extra work, the first one that consults the commit log will save the information in the tuple's “hint bits”.

Dead tuples are eventually reclaimed by VACUUM.

This is all fairly well known, but how is the situation with index entries?

Life, death and visibility in the index

To avoid redundancy and to keep index tuples small, the visibility information is not stored in the index.
The status of an index tuple is determined by the heap tuple it points to, and both are removed by VACUUM at the same time.

As a consequence, an index scan has to inspect the heap tuple to determine if it can “see” an entry. This is the case even if all the columns needed are in the index tuple itself. Even worse, this “heap access” will result in random I/O, which is not very efficient on spinning disks.

This makes index scans in PostgreSQL more expensive than in other database management systems that use a different architecture. To mitigate that, several features have been introduced over the years:

  • PostgreSQL 8.1 introduced the “bitmp index scan”. This scan method first creates a list of heap blocks to visit and then scans them sequentially. This not only reduces the random I/O, but also avoids that the same block is visited several times during an index scan.
  • PostgreSQL 9.2 introduced the “index-only scan”, which avoids fetching the heap tuple. This requires that all the required columns are in the index and the “visibility map” shows that all tuples in the table block are visible to everybody.

But I want to focus on another feature that was also introduced in 8.1, yet never made it into the release notes.

Killed index tuples

The feature was introduced with this commit:

Whenever an index scan fetches a heap tuple only to find that it is dead (that the entire “HOT chain” of tuples is dead, to be more precise), it marks the index tuple as “killed”. Then future index scans can simply ignore it.

This “hint bit for indexes” can speed up future index scans considerably.

An example

Let's create a table to demonstrate killed index tuples. We have to disable autovacuum so that it does not spoil the effect by removing the dead tuples.

We create a number of dead tuples by deleting rows from the table:

Spot the differences!

Now we run the same query twice:

Discussion of the example

There are a number of things that cannot account for the difference:

  • Often the second execution is faster because the first run has to read more data from secondary storage, which are already cached on the second run. But you can observe from the execution plan that in both cases, all blocks were already in the cache (“shared hit”) and none had to be read from disk.
  • No hint bits had to be written during the first execution (no buffers were “dirtied”), because that was already done during the sequential scan from the DELETE.

What happened is that the first execution had to visit all the table blocks and killed the 799000 index tuples that pointed to dead tuples.
The second execution didn't have to do that, which is the reason why it was ten times faster.

Note that the difference in blocks touched by the two queries is not as big as one might suspect, since each table block contains many dead tuples (there is a high correlation).

It is also worth noting that even though they are not shown in the EXPLAIN output, some index pages must have been dirtied in the process, which will cause some disk writes from this read-only query.

Conclusion

Next time you encounter mysterious query run-time differences that cannot be explained by caching effects, consider killed index tuples as the cause. This may be an indication to configure autovacuum to run more often on the affected table, since that will get rid of the dead tuples and speed up the first query execution.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

7 responses to “Killed index tuples”

  1. zheap behaves similarly?

    btpg12=# create unlogged table zhole
    (id integer PRIMARY KEY,
    val text NOT NULL)
    with (storage_engine='zheap');
    CREATE TABLE

    btpg12=# INSERT INTO zhole
    SELECT i, 'text number ' || i
    FROM generate_series(1, 1000000) AS i;
    INSERT 0 1000000

    Time: 4429.463 ms (00:04.429)

    btpg12=# ANALYZE zhole;
    ANALYZE

    Time: 961.388 ms

    btpg12=# delete from zhole
    btpg12-# WHERE id BETWEEN 501 AND 799500;
    DELETE 799000

    Time: 2437.737 ms (00:02.438)

    btpg12=# ANALYZE zhole;
    ANALYZE

    Time: 2025.126 ms (00:02.025)

    btpg12=# EXPLAIN (ANALYZE, BUFFERS, COSTS OFF, TIMING OFF)
    SELECT * FROM zhole
    WHERE id BETWEEN 1 AND 800000;
    QUERY PLAN
    -----------------------------------------------------------------
    Index Scan using zhole_pkey on zhole (actual rows=1000 loops=1)
    Index Cond: ((id >= 1) AND (id = 1) AND (id <= 800000))
    Buffers: shared hit=79
    Planning Time: 0.349 ms
    Execution Time: 1.133 ms
    (5 rows)

    Time: 2.313 ms

    • Yes, that definitely looks like the same thing is happening.

      This is more a feature of the index than of the heap, so I guess they just didn't modify the behaviour there.

      • the vacuum comparison is interesting

        btpg12=# vacuum zhole;
        VACUUM
        Time: 34.849 ms
        btpg12=# vacuum verbose zhole;
        INFO: vacuuming "public.zhole"
        INFO: "zhole": found 0 removable, 0 nonremovable row versions in 0 out of 3952 pages
        DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 712
        There were 0 unused item pointers.
        0 pages are entirely empty.
        CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
        INFO: vacuuming "pg_toast.pg_toast_16875"
        INFO: "pg_toast_16875": found 0 removable, 0 nonremovable row versions in 0 out of 1 pages
        DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 713
        There were 0 unused item pointers.
        0 pages are entirely empty.
        CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

        btpg12=# vacuum verbose hole;
        INFO: vacuuming "public.hole"
        INFO: scanned index "hole_pkey" to remove 799000 row versions
        DETAIL: CPU: user: 0.27 s, system: 0.00 s, elapsed: 0.27 s
        INFO: "hole": removed 799000 row versions in 5090 pages
        DETAIL: CPU: user: 0.02 s, system: 0.00 s, elapsed: 0.02 s
        INFO: index "hole_pkey" now contains 201000 row versions in 2745 pages
        DETAIL: 799000 index row versions were removed.
        2181 index pages have been deleted, 0 are currently reusable.
        CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
        INFO: "hole": found 0 removable, 201000 nonremovable row versions in 6370 out of 6370 pages
        DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 714
        There were 0 unused item pointers.
        Skipped 0 pages due to buffer pins, 0 frozen pages.
        0 pages are entirely empty.
        CPU: user: 0.36 s, system: 0.00 s, elapsed: 0.36 s.
        INFO: vacuuming "pg_toast.pg_toast_16867"
        INFO: index "pg_toast_16867_index" now contains 0 row versions in 1 pages
        DETAIL: 0 index row versions were removed.
        0 index pages have been deleted, 0 are currently reusable.
        CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
        INFO: "pg_toast_16867": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
        DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 714
        There were 0 unused item pointers.
        Skipped 0 pages due to buffer pins, 0 frozen pages.
        0 pages are entirely empty.
        CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

        thanks - this is a very good article

  2. Thanks for sharing this example. The final thing that was missing was re-running the same query after vacuum. For me, timing goes from 138ms before the delete, 93ms first run after, 6.3ms on second run, and .22ms if I do vacuum and re-run with shared hits reducing from 2196 to 15. Tune autovacuum!

  3. Thanks for the article and examples.

    > some index pages must have been dirtied in the process, which will cause some disk writes from this read-only query.

    How does that work with physical replication where a read only query runs on a read-only replica? Are those hint bits replicated? I imagine what must happen is those hint bits are written on the primary and then replicated. But I had never thought about that before. I'm checking this out. https://wiki.postgresql.org/wiki/Hint_Bits Any other summaries of this would be helpful. Cheers.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2025
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram