Some time ago I wrote about joins and especially about outer joins. However, people repeatedly make the same errors over and over again, so I thought that it might be worth it to address the topic again - maybe I can rescue some souls and prevent some bugs.
Table of Contents
1 2 3 4 |
test=# CREATE TABLE a (aid int4); CREATE TABLE test=# CREATE TABLE b (bid int4); CREATE TABLE |
We can populate the tables nicely:
1 2 3 4 |
test=# INSERT INTO a VALUES (1), (2), (3); INSERT 0 3 test=# INSERT INTO b VALUES (2), (3), (4); INSERT 0 3 |
The most simplistic case is a normal inner join. This is not going to cause any
surprises to most users:
1 2 3 4 5 6 |
test=# SELECT * FROM a, b WHERE a.aid = b.bid; aid | bid -----+----- 2 | 2 3 | 3 (2 rows) |
People often ask if the way the join is written will make any difference:
1 2 3 4 5 6 7 8 9 10 11 12 |
test=# explain SELECT * FROM a, b WHERE a.aid = b.bid; QUERY PLAN ----------------------------------------------------------------- Merge Join (cost=337.49..781.49 rows=28800 width=8) Merge Cond: (a.aid = b.bid) -> Sort (cost=168.75..174.75 rows=2400 width=4) Sort Key: a.aid -> Seq Scan on a (cost=0.00..34.00 rows=2400 width=4) -> Sort (cost=168.75..174.75 rows=2400 width=4) Sort Key: b.bid -> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4) (8 rows) |
1 2 3 4 5 6 7 8 9 10 11 12 |
test=# explain SELECT * FROM a JOIN b ON (a.aid = b.bid); QUERY PLAN ----------------------------------------------------------------- Merge Join (cost=337.49..781.49 rows=28800 width=8) Merge Cond: (a.aid = b.bid) -> Sort (cost=168.75..174.75 rows=2400 width=4) Sort Key: a.aid -> Seq Scan on a (cost=0.00..34.00 rows=2400 width=4) -> Sort (cost=168.75..174.75 rows=2400 width=4) Sort Key: b.bid -> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4) (8 rows) |
There is absolutely no difference between an explicit and an implicit join. It
is simply a matter of style but it has no impact on the way the planner works
(unless you exceed join_collapse_limit).
Inner joins should be a pretty common thing. Outer joins cause a lot more problems. Here is a simple case:
1 2 3 4 5 6 7 |
est=# SELECT * FROM a LEFT JOIN b ON (a.aid = b.bid); aid | bid -----+----- 1 | 2 | 2 3 | 3 (3 rows) |
The situation starts to be tricky if you start to add filters to the query. People might expect to reduce the number of rows by a filter like that but this is NOT the case - keep in mind: A filter in the ON-clause WILL NOT reduce the number of rows on either side of the join. This is not a bug - this is totally desired and logical behavior:
1 2 3 4 5 6 7 |
test=# SELECT * FROM a LEFT JOIN b ON (a.aid = b.bid AND aid = 2); aid | bid -----+----- 1 | 2 | 2 3 | (3 rows) |
Most people would think that we should get just one row instead of three BUT all
we do is to limit the data which PostgreSQL will use to find common rows.
If you want to filter data - use a WHERE-clause instead:
1 2 3 4 5 |
test=# SELECT * FROM a LEFT JOIN b ON (a.aid = b.bid) WHERE a.aid = 2; aid | bid -----+----- 2 | 2 (1 row) |
Please think twice when you are writing outer joins - otherwise your results
will simply be plain wrong.
Related blog topics:
Enforcing join orders in PostgreSQL
Time in PostgreSQL: Outer joins
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply