CYBERTEC PostgreSQL Logo

Table bloat revisited: Making tables shrink in PostgreSQL

08.2013 / Category: / Tags:

UPDATED August 2023 - Table bloat in PostgreSQL: Many people wonder why deleting data from a table in a PostgreSQL database does not shrink files on disk. You would expect storage consumption to go down when data is deleted. This is not always the case.

To show how this works, I have compiled some examples.

Let's get started with a basic table and add 100k rows to it:

The size of the table will be around 3.5 MB on disk. We can find out about the size of a table using pg_relation_size:

Now let's delete half of the rows:

Many users would expect this table to shrink to half the size. This is not the case:

The reason why the storage space doesn't decrease is that when you delete some rows, it does not necessarily mean that nobody else can see the deleted rows anymore. There might still be transactions around which might at some point need those row versions you are about to delete. In other words: PostgreSQL does not delete a row instantly, but marks it invisible, so that certain transactions cannot see it.

To clean up those rows which are really dead, we can run VACUUM:

In our example the table has been reduced in size:

How shrinking works

VACUUM is able to reclaim space and return it to the filesystem, but this is not always possible. It is only possible to shrink a file by cutting off stuff at the end. Cutting out data at the end of a data file is only possible if (and only if) there are no valid rows anymore AFTER a certain position in the file. In our example we have loaded data in ascending order into our PostgreSQL table (1, 2, 3, etc.). If we delete everything larger than 50.000 it means that the second half of the table will be gone. VACUUM will go through the table and figure out that the second half of the data file contains nothing but chunk and truncate the data file to return space to the filesystem.

When shrinking does not reduce table bloat

In the first example, we arranged things in a way that we can see VACUUM returning space to the file system.

Now, let's construct an example where this is not possible:

First of all we drop the table and create a new one:

Then we load 100.000 rows just like before:

In this example we will delete the first half of the data:

Just like before we can run VACUUM:

This time the size of the underlying table is the same size:

Remember what we said about cleanup and VACUUM? VACUUM can only shrink a file when free space is at the end. However, our example has been set up in a way that this is not the case.

CTIDs and their role in understanding PostgreSQL table bloat

In PostgreSQL there is a thing called CTID, which says where a certain row is. ctid=(0, 14) would mean that we are talking about the 14th row in the first data block of the table.

Take a look at our table and check out the highest 5 CTIDs:

As you can see, the last row is in block 442. Do the math: 442 * 8192 = 3.5 MB. This is why the table has not changed in size.

Let's check the lowest CTID:

We can see that the first valid row is somewhere in the middle of the table. Therefore the table has not been truncated by VACUUM.

UPDATE AUGUST 2023:

Postgres version 12 introduced a new table storage parameter vacuum_truncate that controls the functionality to truncate off any empty pages at the end of a table. For example, you might want to avoid vacuum truncation for the following reasons:


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

Comments are closed.

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram