Outer joins revisited

Some time ago I wrote about joins and especially about outer joins. However, people repeatedly make the same errors over and over again, so I thought that it might be worth it to address the topic again - maybe I can rescue some souls and prevent some bugs.

Table of Contents

Basic example for outer joins

test=# CREATE TABLE a (aid int4);
CREATE TABLE
test=# CREATE TABLE b (bid int4);
CREATE TABLE

test=# CREATE TABLE a (aid int4);

CREATE TABLE

test=# CREATE TABLE b (bid int4);

CREATE TABLE

We can populate the tables nicely:

test=# INSERT INTO a VALUES (1), (2), (3);
INSERT 0 3
test=# INSERT INTO b VALUES (2), (3), (4);
INSERT 0 3

test=# INSERT INTO a VALUES (1), (2), (3);

INSERT 0 3

test=# INSERT INTO b VALUES (2), (3), (4);

INSERT 0 3

The most simplistic case is a normal inner join. This is not going to cause any
surprises to most users:

test=# SELECT * FROM a, b WHERE a.aid = b.bid;
 aid | bid 
-----+-----
   2 |   2
   3 |   3
(2 rows)

test=# SELECT * FROM a, b WHERE a.aid = b.bid;

aid | bid

-----+-----

2 | 2

3 | 3

(2 rows)

The differences

People often ask if the way the join is written will make any difference:

test=# explain SELECT * FROM a, b WHERE a.aid = b.bid;
                           QUERY PLAN                            
-----------------------------------------------------------------
 Merge Join  (cost=337.49..781.49 rows=28800 width=8)
   Merge Cond: (a.aid = b.bid)
   ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
         Sort Key: a.aid
         ->  Seq Scan on a  (cost=0.00..34.00 rows=2400 width=4)
   ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
         Sort Key: b.bid
         ->  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)
(8 rows)

test=# explain SELECT * FROM a, b WHERE a.aid = b.bid;

QUERY PLAN

-----------------------------------------------------------------

Merge Join (cost=337.49..781.49 rows=28800 width=8)

Merge Cond: (a.aid = b.bid)

-> Sort (cost=168.75..174.75 rows=2400 width=4)

Sort Key: a.aid

-> Seq Scan on a (cost=0.00..34.00 rows=2400 width=4)

-> Sort (cost=168.75..174.75 rows=2400 width=4)

Sort Key: b.bid

-> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4)

(8 rows)

test=# explain SELECT * FROM a JOIN b ON (a.aid = b.bid);
                           QUERY PLAN                            
-----------------------------------------------------------------
 Merge Join  (cost=337.49..781.49 rows=28800 width=8)
   Merge Cond: (a.aid = b.bid)
   ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
         Sort Key: a.aid
         ->  Seq Scan on a  (cost=0.00..34.00 rows=2400 width=4)
   ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
         Sort Key: b.bid
         ->  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)
(8 rows)

test=# explain SELECT * FROM a JOIN b ON (a.aid = b.bid);

QUERY PLAN

-----------------------------------------------------------------

Merge Join (cost=337.49..781.49 rows=28800 width=8)

Merge Cond: (a.aid = b.bid)

-> Sort (cost=168.75..174.75 rows=2400 width=4)

Sort Key: a.aid

-> Seq Scan on a (cost=0.00..34.00 rows=2400 width=4)

-> Sort (cost=168.75..174.75 rows=2400 width=4)

Sort Key: b.bid

-> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4)

(8 rows)

There is absolutely no difference between an explicit and an implicit join. It
is simply a matter of style but it has no impact on the way the planner works
(unless you exceed join_collapse_limit).

Inner joins vs. outer joins

Inner joins should be a pretty common thing. Outer joins cause a lot more problems. Here is a simple case:

est=# SELECT * FROM a LEFT JOIN b ON (a.aid = b.bid);
 aid | bid 
-----+-----
   1 |    
   2 |   2
   3 |   3
(3 rows)

est=# SELECT * FROM a LEFT JOIN b ON (a.aid = b.bid);

aid | bid

-----+-----

1 |

2 | 2

3 | 3

(3 rows)

Caution!

The situation starts to be tricky if you start to add filters to the query. People might expect to reduce the number of rows by a filter like that but this is NOT the case - keep in mind: A filter in the ON-clause WILL NOT reduce the number of rows on either side of the join. This is not a bug - this is totally desired and logical behavior:

test=# SELECT * FROM a LEFT JOIN b ON (a.aid = b.bid AND aid = 2);
 aid | bid 
-----+-----
   1 |    
   2 |   2
   3 |    
(3 rows)

test=# SELECT * FROM a LEFT JOIN b ON (a.aid = b.bid AND aid = 2);

aid | bid

-----+-----

1 |

2 | 2

3 |

(3 rows)

Most people would think that we should get just one row instead of three BUT all
we do is to limit the data which PostgreSQL will use to find common rows.

If you want to filter data - use a WHERE-clause instead:

test=# SELECT * FROM a LEFT JOIN b ON (a.aid = b.bid) WHERE a.aid = 2;
 aid | bid 
-----+-----
   2 |   2
(1 row)

test=# SELECT * FROM a LEFT JOIN b ON (a.aid = b.bid) WHERE a.aid = 2;

aid | bid

-----+-----

2 | 2

(1 row)

Please think twice when you are writing outer joins - otherwise your results
will simply be plain wrong.

Outer joins revisited

Basic example for outer joins

The differences

Inner joins vs. outer joins

Caution!

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts