If your PostgreSQL database is purely maintained (lack of VACUUM) or badly structured, you might face some table bloat.
UPDATE Sept. 2023 - See this blog for a more recent update on this topic: Table bloat revisited.
Table bloat is somewhat nasty because it slows down your database and eats up more storage than needed. Therefore it can be important to figure out if a table is bloated or not. The PostgreSQL contrib package provides a package called pgstattuple to diagnose table bloat.
To use the module we have to activate it:
1 2 |
test=# CREATE EXTENSION pgstattuple; CREATE EXTENSION |
For a test we create a table and add some 10.000 rows to it on the fly:
1 2 |
test=# CREATE TABLE t_test AS SELECT * FROM generate_series(1, 10000); SELECT 10000 |
We can already see, how pgstattuple works. To make sure the psql output stays readable, we have called ...
1 2 |
test=# x Expanded display is on. |
In our example pgstattuple
will return the following data:
1 2 3 4 5 6 7 8 9 10 11 |
test=# SELECT * FROM pgstattuple('t_test'); -[ RECORD 1 ]------+------- table_len | 368640 tuple_count | 10000 tuple_len | 280000 tuple_percent | 75.95 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 7380 free_percent | 2 |
As you can see the size of the table is somewhat over 368k. Our table has a fill grade of around 76%. Note that those numbers don't add up to 100% completely. This is due to some overhead. In reality the fill grade of a freshly loaded table will be a lot higher than in our trivial single column case.
To demonstrate table bloat we can delete some data. In this example we delete one third of those rows:
1 2 |
test=# DELETE FROM t_test WHERE generate_series % 3 = 0; DELETE 3333 |
Note, the size of the table on disk is still the same. Some of those valid rows have just been moved to the “dead tuple” section of the output:
1 2 3 4 5 6 7 8 9 10 11 |
test=# SELECT * FROM pgstattuple('t_test'); -[ RECORD 1 ]------+------- table_len | 368640 tuple_count | 6667 tuple_len | 186676 tuple_percent | 50.64 dead_tuple_count | 3333 dead_tuple_len | 93324 dead_tuple_percent | 25.32 free_space | 7380 free_percent | 2 |
The first lesson here is that DELETE does not shrink a table on disk. It merely marks rows as dead. This is highly important – many people are misled by this behavior.
To reclaim the space occupied by those dead rows we can call VACUUM:
1 2 |
test=# VACUUM t_test; VACUUM |
What we see here is that those dead rows have been converted to free space. And yes, the size of the table is still the same:
1 2 3 4 5 6 7 8 9 10 11 |
test=# SELECT * FROM pgstattuple('t_test'); -[ RECORD 1 ]------+------- table_len | 368640 tuple_count | 6667 tuple_len | 186676 tuple_percent | 50.64 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 114036 free_percent | 30.93 |
This free space can now be used to store new rows inside your table.
In our example we have checked one table. But what if you want to check your entire database to see who is to blame?
In this case, you must make use of the PostgreSQL type system and a system catalog. Let us discuss the type system first: In the previous example we have seen that pgstattuple can be called in the FROM-clause because it returns an entire row. But we have to do this for all tables – not just for some. To make sure that we can put the list of objects we want to check into the FROM-clause (in the form of the pg_class system table) we have to move the function call to the SELECT clause. The call will return a composite type consisting of all fields:
1 2 3 4 5 6 7 |
test=# SELECT relname, pgstattuple(oid) FROM pg_class WHERE relkind = 'r' LIMIT 3; relname | pgstattuple --------------+---------------------------------------------------- pg_statistic | (147456,387,115800,78.53,27,8161,5.53,19840,13.45) pg_type | (73728,357,61965,84.05,20,3440,4.67,4828,6.55) pg_authid | (8192,3,324,3.96,0,0,0,7816,95.41) (3 rows) |
This is not really readable so we have to break up those fields again. Some brackets and a star will do the job:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
test=# SELECT relname, (pgstattuple(oid)).* FROM pg_class WHERE relkind = 'r' LIMIT 3; -[ RECORD 1 ]-------+------------- relname | pg_statistic table_len | 147456 tuple_count | 387 tuple_len | 115800 tuple_percent | 78.53 dead_tuple_count | 27 dead_tuple_len | 8161 dead_tuple_percent | 5.53 free_space | 19840 free_percent | 13.45 -[ RECORD 2 ]-------+------------- relname | pg_type table_len | 73728 tuple_count | 357 tuple_len | 61965 tuple_percent | 84.05 dead_tuple_count | 20 dead_tuple_len | 3440 dead_tuple_percent | 4.67 free_space | 4828 free_percent | 6.55 -[ RECORD 3 ]-------+------------- relname | pg_authid table_len | 8192 tuple_count | 3 tuple_len | 324 tuple_percent | 3.96 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 7816 free_percent | 95.41 |
We can now sort and filter at will to figure out which table is to blame.
See also Laurenz Albe's post about "How a bad network connection can cause table bloat."
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply