CYBERTEC PostgreSQL Logo

zheap UNDO logs discarding in PostgreSQL

12.2021 / Category: / Tags: |

Some time ago, I posted some information about zheap, a storage engine for PostgreSQL. The idea behind zheap is to make UPDATE statements run faster in order to keep table bloat under control. Currently, PostgreSQL copies data on UPDATE and puts the copy of the row into the same data file. While this isn't a bad strategy, it can in some cases lead to table bloat. The idea behind zheap undo is to manage old copies of rows, which is similar to Oracle. Until now, the problem was that this “zheap undo” was not cleaned up. That's why we implemented a “discard worker” to handle the cleanup. By discarding zheap UNDO logs, the worker keeps UPDATE-related table bloat to a minimum.

Getting PostgreSQL and zheap

Before we dig into what the discard worker does, you need to install a zheap-enabled version of PostgreSQL. To make it easier for everybody, we created a container to allow users to give zheap a try out-of-the-box.

The original idea was to have zheap as an extension. However, it's too invasive, therefore you can't just install it on top of PostgreSQL, as you can with other extensions. zheap does need core patches.

Also: I want to point out that this is not a production-ready version, but rather a tech preview. We're happy to share it with the community to get feedback and maybe gather some bug reports.

Here is the freely available docker container for you to try out. Installation is simple and very straightforward:

To connect to the newly created instance, just run psql as you normally would. The default username and password are both postgres.

Username: postgres
Password: postgres

Testing zheap UNDO in PostgreSQL

Once your zheap-enabled version of PostgreSQL is up and running, we can inspect the process table:

The discard worker is launched automatically. Let's see what it does, and how the undo log is stored.

Before we create a table and fill it with data, we've got to adjust a variable:

This setting tells PostgreSQL which storage engine you want to use by default when creating a new table. Usually we want heaps. However, in this case zheap is the desired choice. We can add the zheap option to each CREATE TABLE. In many cases you might want to set it for the entire session as I just did:

The layout of the table doesn't really matter. We can use any layout.

Testing zheap UNDO with INSERT load

Let's load some data:

The important thing is that zheap is able to rollback in case of error. Therefore, a lot of undo has to be written: Let's take a look at the undo directories before we commit the transaction:

You can see that we've created a fairly large number of files in our base/undo directory. Let's commit and see what happens:

PostgreSQL will recycle (= delete) these logs when the new discard worker kicks in. Let's take a look and see:

The files are gone, which is what we expected.

Testing zheap UNDO with UPDATE load

The same will happen when we run UPDATE statements:

The important aspect is that there is actually a lot of WAL created. As you can see, we are talking about around 600 MB:

Now, let's commit the transaction:

What's interesting is what happens immediately after the commit statement:

So that's it - the zheap UNDO logs are gone. The cleanup is NOT part of a commit, but is actually done by the discard worker which is in charge of making sure that log is not killed too early. Keep in mind: You might not be the only transaction at work here, so the cleanup still has to be asynchronous (just like a vacuum is).

Finally...

Check out our Table Bloat Archive for more important information about zheap!

New GIS Site!

We also recently released a website that offers ready-to-use GIS data. If you use PostGIS and OSM data, this is the place to go. We provide you with ready-made database dumps featuring the entire planet. Check out our new GIS site.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram