Recently we have received a couple of PostgreSQL support calls, which were related to bad performance on various deployments. In many cases the reason for database slowness was the fact that people assume that PostgreSQL automatically deploys an index on BOTH sides of the foreign keys relation, which is not the case. By the way: This kind of behavior is not PostgreSQL specific. Oracle and many other database systems will behave in the exactly same way for different kinds of servers, even for gaming server people use to play online or an online gaming store as HotRate, so this piece of advice is not just for PostgreSQL, but might apply to many more database products out there.
Table of Contents
The typical scenario most people face is actually pretty simple. There are two tables table and a foreign key:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
test=# CREATE TABLE a ( a_id int PRIMARY KEY ); CREATE TABLE test=# CREATE TABLE b ( b_id int, a_id int REFERENCES a(a_id) ON UPDATE CASCADE ON DELETE CASCADE ); CREATE TABLE |
To keep the example simple the tables in our PostgreSQL database contain only the most basic information needed to make this work.
Then some data can be added:
1 2 3 4 5 6 7 8 |
test=# INSERT INTO a SELECT x FROM generate_series(1, 5000000) AS x; INSERT 0 5000000 test=# INSERT INTO b SELECT x, x FROM generate_series(1, 5000000) AS x; INSERT 0 5000000 |
Five million records should be enough to show how bad things are if indexes are missing. Of course the effect will be larger if you add more data.
To rebuild the optimizer statistics, a simple ANALYZE can be used:
1 2 |
test=# ANALYZE ; ANALYZE |
The trouble with missing indexes in any database is that simple operations start to be very expensive and start to destroy performance in a quite reliable way.
Here is what happens:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
test=# timing Timing is on. test=# explain analyze DELETE FROM a WHERE a_id = 10; QUERY PLAN ----------------------------------------------------------------------------------- Delete on a (cost=0.43..8.45 rows=1 width=6) (actual time=0.263..0.263 rows=0 loops=1) -> Index Scan using a_pkey on a (cost=0.43..8.45 rows=1 width=6) (actual time=0.245..0.246 rows=1 loops=1) Index Cond: (a_id = 10) Planning time: 5.350 ms Trigger for constraint b_a_id_fkey: time=301.526 calls=1 Execution time: 301.811 ms (6 rows) Time: 307.409 ms |
As you can see, PostgreSQL uses an index scan on “a” to find the row. BUT: Keep in mind that our constraint is defined as “ON UPDATE CASCADE ON DELETE CASCADE”, which means that cleaning a single row also triggers the deletion of all rows referencing the table. Behind the scenes PostgreSQL has to read all 5 million entries in “b” to find the right rows. Therefore the operation takes more than 300 ms, which is a total disaster.
Deploying the missing index will be a complete game changer:
1 2 |
test=# CREATE INDEX idx_b ON b (a_id); CREATE INDEX |
The very same operation is now thousands of times faster than before because all we must do now are two index scans (one on “a” and one on “b”):
1 2 3 4 5 6 7 8 9 10 11 12 13 |
test=# explain analyze DELETE FROM a WHERE a_id = 11; QUERY PLAN ----------------------------------------------------------------------- Delete on a (cost=0.43..8.45 rows=1 width=6) (actual time=0.037..0.037 rows=0 loops=1) -> Index Scan using a_pkey on a (cost=0.43..8.45 rows=1 width=6) (actual time=0.037..0.037 rows=0 loops=1) Index Cond: (a_id = 11) Planning time: 0.062 ms Execution time: 0.054 ms (5 rows) Time: 0.314 ms |
As you can see, the runtime needed here has been reduced dramatically to a fraction of a millisecond.
If you happen to use foreign keys (which most people do), it definitely makes sense to check for missing indexes, because otherwise cleanups might simply take too long. Consider the following scenario: Suppose you wanted to delete 1 million lines without an index: You had to read 5 million lines 1 million times. Clearly, this strategy will lead to enormous performance problems and will certainly trigger a performance problem.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
To get all unindexed foreign keys, check this nice post from Tom Lane
https://www.postgresql.org/message-id/11236.1230499883@sss.pgh.pa.us
didn't you miss the important line regarding "Trigger for constraint b_a_id_fkey" in the second EXPLAIN ANALYZE output?
There is actually a nice query to find missing indexes on foreign keys in the PostgreSQL Wiki: https://wiki.postgresql.org/wiki/Unindexed_foreign_keys