Table of Contents
After you ANALYZE
a PostgreSQL table to collect value distribution statistics, you will find the gathered statistics for each column in the pg_stats
system view. This article will explain the meaning of the correlation
column and its impact on index scans.
Most common PostgreSQL data types have an ordering: they support the operators <
, <=
, =
, >=
and >
.
Such data types can be used with a B-tree index (the “standard” index type).
The values in a column of such a type provide a logical ordering of the table rows. An index on this column will be sorted according to that ordering.
A PostgreSQL table consists of one or more files of 8KB blocks. The order in which the rows are stored in the file is the physical ordering.
You can examine the physical ordering of the rows by selecting the ctid
system column: it contains the block number and the item number inside the block, which describe the physical location of the table row.
The correlation for a column is a value between -1 and 1. It tells how good the match between logical and physical ordering is.
You will create indexes on your tables for faster access (but not too many!).
The correlation of a column has an impact on the performance of an index scan.
During an index scan, the whole index or part of it are read in index sequential order. For each entry that is found, the corresponding row is fetched from the table (this is skipped in an “index only scan”, but that is a different story).
If the correlation of the indexed column is close to zero, the fetched rows will be from all over the table. This will result in many randomly distributed reads of many different table blocks.
However, if the correlation is close to 1 or -1, the next row fetched during the index scan tends to be in the same or the next table block as the previous row.
High correlation has two advantages:
Let's create two tables with identical content, but different correlation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
CREATE TABLE corr (id, val) AS SELECT i, 'some text ' || i FROM generate_series(1, 100000) AS i; CREATE INDEX corr_idx ON corr (id); VACUUM (ANALYZE) corr; SELECT correlation FROM pg_stats WHERE tablename = 'corr' AND attname = 'id'; correlation ------------- 1 (1 row) CREATE TABLE uncorr AS SELECT * FROM corr ORDER BY random(); CREATE INDEX uncorr_idx ON uncorr (id); VACUUM (ANALYZE) uncorr; SELECT correlation FROM pg_stats WHERE tablename = 'uncorr' AND attname = 'id'; correlation ------------- -0.00522369 (1 row) |
We disable bitmap index scans so that we can compare index scans on both tables.
Then we check how index scans perform:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
SET enable_bitmapscan = off; EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM corr WHERE id BETWEEN 1001 AND 1300; QUERY PLAN --------------------------------------------------- Index Scan using corr_idx on corr (cost=0.29..15.23 rows=297 width=19) (actual time=0.108..0.732 rows=300 loops=1) Index Cond: ((id >= 1001) AND (id <= 1300)) Buffers: shared hit=6 Planning time: 0.456 ms Execution time: 1.049 ms (5 rows) EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM uncorr WHERE id BETWEEN 1001 AND 1300; QUERY PLAN --------------------------------------------------- Index Scan using uncorr_idx on uncorr (cost=0.29..978.15 rows=298 width=19) (actual time=0.105..2.352 rows=300 loops=1) Index Cond: ((id >= 1001) AND (id <= 1300)) Buffers: shared hit=303 Planning time: 0.548 ms Execution time: 2.736 ms (5 rows) |
Now 2.7 milliseconds is not so bad, but that is only because all blocks were already in shared buffers.
If a part of these blocks has to be read from disk, the 303 blocks from the second query will do much worse than the 6 from the first!
In the second query, each result row was found in a different table block. This caused 300 blocks to be touched. The remaining three blocks are index blocks.
The first query touches only three table blocks:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
SELECT ctid, id FROM corr WHERE id BETWEEN 1001 AND 1300; ctid | id ---------+------ (6,58) | 1001 (6,59) | 1002 (6,60) | 1003 (6,61) | 1004 (6,62) | 1005 (6,63) | 1006 (6,64) | 1007 ... (8,37) | 1294 (8,38) | 1295 (8,39) | 1296 (8,40) | 1297 (8,41) | 1298 (8,42) | 1299 (8,43) | 1300 (300 rows) |
Indeed, all rows are contained in the table blocks 6, 7 and 8!
The PostgreSQL optimizer estimates the cost of the possible ways to execute an SQL statement.
With the use of the correlation it can give better estimates of the cost of an index scan, leading to better plan choices.
The PostgreSQL optimizer will prefer index scans if the correlation is close to 1 or -1.
PostgreSQL 9.5 introduced the BRIN index (block range index).
This index works be storing the minimum and maximum of all values for ranges of table blocks. It is only useful for columns with perfect correlation. Its advantage over the B-tree index is its much smaller size, which makes it an interesting option for large tables.
If you need to efficiently scan bigger portions of an index, it is good to keep the table in index order.
There are no “index ordered tables” in PostgreSQL.
Still, high correlation for a column can be maintained in two ways:
Since correlation is always perfect in this case, a BRIN index can be an interesting option.
If you want to remove old data from a table without disrupting the physical ordering, you can use table partitioning.
CLUSTER
can be used to rewrite a table so that the physical ordering is identical to the logical ordering of an index.
However, subsequent modifications of the table will reduce the correlation again. Because of that, you need to re-cluster the table regularly to maintain high correlation. This is annoying, because CLUSTER
blocks all concurrent access to the table.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply