CYBERTEC PostgreSQL Logo

VACUUM: Managing and detecting bloat

09.2024 / Category: / Tags:

In this introduction, we will explore the concept of VACUUM in PostgreSQL, its importance, and best practices for implementing effective vacuuming strategies to ensure your database remains efficient, performant, and reliable. Before we dive into strategies, problem detection and so on, it is important to understand why VACUUM is needed in the first place.

VACUUM and database transactions

VACUUM will always exist in PostgreSQL and many myths have formed around it. Therefore, it is important to shed some light on why VACUUM is necessary.  The reason why VACUUM is so relevant can be summed up in one word and one word only: “Transactions”. 

Why is that the case? Consider the following example:

Transaction 1Transaction 2
SELECT count(*) FROM tab;
… returns 0 …
BEGIN;
INSERT INTO table 
    VALUES (1), (2), (3);
SELECT count(*) FROM tab;SELECT count(*) FROM tab;
… returns 3 …… returns 0 …
ROLLBACK;← 3 rows are now eliminated

What is important to understand is that our two sessions will see different data. The left session will actually see 3 rows, while the right session, happening at the same time, won’t see any rows because the left transaction has not been committed yet. 

The question naturally arising is: How can we get rid of those 3 rows that are subject to our ROLLBACK? In contrast to MS SQL and Oracle, PostgreSQL does NOT remove those rows from disk during a ROLLBACK. That is the reason why ROLLBACK is lightning fast in PostgreSQL and usually a real pain in MS SQL (in case of large transactions). 

The job of VACUUM is to clean out those rows as soon as a row cannot be seen by any active transaction anymore. Here is a picture that shows how this works:

blog vacuum_connection graphic

In this case (as outlined in our blog about VACUUM and hot_standby_feedback), the first VACUUM will NOT remove rows from disk because the transaction on the left side can still see the data. There are some key takeaways in this example one should be aware of:

Running VACUUM is therefore not enough - it is also necessary to give VACUUM a change to actually do something. Getting rid of long transactions is therefore key to success. 

Why VACUUM in PostgreSQL does not return space to the OS

The classical question people ask our PostgreSQL support teams are:

  • “Why does VACUUM not make my table smaller”?
  • “How can VACUUM return space to the filesystem?”
  • “Is it necessary to run VACUUM FULL?”

The following example will try to shed some light here:

We have created a table which consists of 30 really large rows (each of them > 1.500 bytes). This means that only 5 rows will fit into an 8k block in PostgreSQL. To see which row is where, we can make use of a hidden column called “ctid”. It contains the number of the block as well as the row inside the block (ctid = ‘(2, 4)’ means the third block, 4th row). What happens if we delete a row and run VACUUM? Before we do that, we can enable some extensions to actually see what is going on inside our table:

This extension allows us to find out how much space happens to be in each of our 8k blocks. In this case, we are talking about 448 bytes per block. The reason is that a row has to fit into a single block (in this case). 

Now that we understand what has happened, we can delete some data and see what is going on:

What we see here is a gap in the data file:

Rows (1, 2) and (1, 3) are gone (= marked as dead). What we need to understand is that the data is basically dead on disk. If we take a look at the content of the freespace map, we can see that it has not changed:

This is an important takeaway: space is NOT freed when data is deleted - space is reclaimed by VACUUM:

Once VACUUM has been executed (assuming that no long running transaction is blocking it from doing its work),  the freespace map (= FSM) is going to reflect the new reality on disk:

Note that we still have 5 blocks - the difference is that now the second block has much more free space. A normal VACUUM will not move data between blocks - that is simply the way it is. Therefore, space is not returned to the operating system (filesystem). But it doesn't matter much because in the vast majority of cases we want to use it later anyway:

As we can see, the next write will fill the gaps and re-allocate the space for us. This is important because it means that returning space to the filesystem (operating system) would be a temporary thing anyway after a VACUUM FULL. The next writer operation would immediately ask for more filesystem space anyway. In reality, VACUUM FULL is way less important than people think. Quite the contrary: It can cause issues because it needs a harsh table lock. I ‘m not saying that VACUUM FULL is pointless - what I am saying here is that it is used way too often.

pg_squeeze: Shrinking tables more efficiently

As mentioned already, the problem with VACUUM FULL is that it needs a table lock. In many cases, this essentially means “downtime”. If you cannot write data to a table, it basically means that your application is no longer usable.

An alternative to this is pg_squeeze. It is an Open Source tool that can be added via package manager to reorganize tables without excessive locking. It allows you to reorganize a table without blocking writes during the process.

How_It_Works_pg_squeeze

The way it works is to take an initial (small) copy of the data in the table (excluding bloat) and apply the changes happening during that initial copy. At the end of the process, the data files are swapped out in the system catalog, leading to minimal locking (just milliseconds).

Detecting bloat in an organized way

The main issue is that before we start to fight bloat, it is a good idea to actually find out which tables are bloated and which ones are not. Doing this for a single table can be done using the pgstattuple extension which is part of the PostgreSQL contrib package. Here’s how it works:

First of all, we have to enable the extension. Then, we will again create some sample data:

The table has been fully populated, but let’s see what happens when some of the data is deleted:

By calling pgstattuple, we can inspect the table and find out what our data files consist of. The output of pgstattuple is a fairly broad table, so we can use \x to make the output more digestible:

What can we see here? First of all we can see the size of our data file (table_len) and the number of rows in the table (tuple_count). We will also see the amount of dead space (dead_tuple_len) as well as the amount of free space in the table. Remember: “valid + dead + free = table size”. The important aspect is that those values should be in a useful ratio. Depending on the type of application, this may of course vary and therefore it is not useful to dogmatically stick to a fixed threshold of “what is good” and “what is bad”.

However, we have to keep one thing in mind: The pgstattuple function scans the entire table. In other words, if your table is 25 TB you will read 25 TB - just to get those numbers. In real life this can be a huge problem. To solve the problem, PostgreSQL offers the pgstattuple_approx function which does not read the entire table but instead takes a sample. Here’s how it works:

The function executes a lot faster and gives us deep insights.

Running pgstattuple “at scale”

So far you have learned to run pgstattuple for a single table. But how can we do that for all tables in our database?

Here is a query that might be useful:

Without going into the details of why we should use LATERAL and so on, here is what the output might look like:

The query will return all non-system tables in our database and inspect them one by one. The list is ordered (largest tables first). This will give us valuable insights into the structure and content of our data files associated with those tables.

Why controlled bloat can be useful

There is an important aspect related to VACUUM that is often overlooked: many people think that shrinking tables is beneficial under all circumstances. However, this is not necessarily true. In case your tables are facing many UPDATE statements, it might be beneficial to use a thing called FILLFACTOR. The idea is to not completely fill up your 8k blocks, but instead leave some space for UPDATEs to put the copy of a row into the same block. Using this strategy, we can avoid some I/O and make things more efficient. In the past, we have written some blog posts on those issues:

We recommend taking a look at these articles to gain more understanding regarding this topic. 

Still, let’s revert back to the first table we created and see why this matters:

The listing empties the table and adds 10 new large rows:

What we see is that two blocks have been tightly packed together. This means that UPDATE cannot simply keep the copy it has to create within the same block. Therefore, we have to touch the first block containing the updated row and write a new one containing the new version:

In other words, we had to touch two blocks just to update a single row. The situation changes when there is still some space left inside the block as we can see in the next example:

We notice that only block number three has changed. ctid ‘(2,1)’ is gone, and the copy has been placed into ‘(2, 2)’ which is the same block. Only half of the disk I/O was necessary. By setting a clever FILLFACTOR (= usually somewhere between 60 and 80), we can make this type of UPDATE (= within the same block) more likely:

As you can see, the size of a table does matter in many cases but sometimes, it can also be a little counterproductive (which is counterintuitive for many people). Still, some basic understanding is needed to tune those things. 

Finally …

If you want to learn more about PostgreSQL and COMMIT in particular, I recommend checking our post on PostgreSQL COMMIT performance

5 1 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram