UPDATED July 2023: The PostgreSQL optimizer is really a wonderful piece of software which is capable of doing great things. One of those great things is so-called “join pruning”. In case PostgreSQL detects a join which is actually not needed for execution, it will automatically remove it from the plan. Removing joins from the plan can result in significantly better performance and provide end users with much simpler plans.
Table of Contents
Two tables are needed:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
test=# CREATE TABLE a (aid int PRIMARY KEY); CREATE TABLE test=# INSERT INTO a VALUES (1), (2), (3); INSERT 0 3 test=# CREATE TABLE b (bid int PRIMARY KEY); CREATE TABLE test=# INSERT INTO b VALUES (2), (3), (4); INSERT 0 3 |
Mind that both tables have a primary key - which will play a major role later on.
To see what PostgreSQL does, we can write a basic query:
1 2 3 4 5 6 7 8 |
test=# SELECT * FROM a LEFT JOIN b ON (aid = bid); aid | bid -----+----- 1 | 2 | 2 3 | 3 (3 rows) |
This is a left-join, which takes all values from the left and matches values on the right. Here's the plan:
1 2 3 4 5 6 7 8 9 10 |
test=# explain SELECT * FROM a LEFT JOIN b ON (aid = bid); QUERY PLAN ----------------------------------------------------------------- Hash Left Join (cost=67.38..137.94 rows=2550 width=8) Hash Cond: (a.aid = b.bid) -> Seq Scan on a (cost=0.00..35.50 rows=2550 width=4) -> Hash (cost=35.50..35.50 rows=2550 width=4) -> Seq Scan on b (cost=0.00..35.50 rows=2550 width=4) (5 rows) |
PostgreSQL will perform a hash join to produce the final result. In the SELECT-clause, a star will ensure that all columns are returned.
However, what happens if only columns from the first table are needed?
1 2 3 4 5 6 |
test=# explain SELECT a.* FROM a LEFT JOIN b ON (aid = bid); QUERY PLAN ----------------------------------------------------- Seq Scan on a (cost=0.00..35.50 rows=2550 width=4) (1 row) |
In this case, PostgreSQL can prune the plan and just do a sequential scan on “a”.
Why is that possible? Well, first of all no information from “b” is needed to satisfy the SELECT-clause. However, this is not a sufficient condition to justify eliminating the table from the execution plan. Both sides must be unique. Remember, in a 1:n relationship the join could potentially return more data than a query without the join would. So, only if both sides are unique is it logically possible to just skip the second table. If you can prove that a simpler plan will produce the same result, you can use the simpler plan.
Read more about joins and performance:
There have been two patches since this blog post was originally written that use the table definition to improve row count estimates or take shortcuts:
PostgreSQL 9.6 has added the following:
Use foreign key relationships to infer selectivity for join predicates (Tomas Vondra, David Rowley)
Here also is the improved version of this commit.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply