If you happen to be an SQL developer, you will know that joins are really at the core of the language. Joins come in various flavors: Inner joins, left joins, full joins, natural joins, self joins, semi-joins, lateral joins, and so on. However, one of the most important distinctions is the difference between implicit and explicit joins. Over the years, flame wars have been fought over this issue. Still, not many people know what's really going on. I hope this post helps to shed some light on the situation.
Table of Contents
Before I dig into practical examples, let's create some tables that we can later use to perform our joins:
1 2 3 4 |
test=# CREATE TABLE a (id int, aid int); CREATE TABLE test=# CREATE TABLE b (id int, bid int); CREATE TABLE |
In the next step, some rows are added to the tables:
1 2 3 4 5 6 |
test=# INSERT INTO a VALUES (1, 1), (2, 2), (3, 3); INSERT 0 3 test=# INSERT INTO b VALUES (2, 2), (3, 3), (4, 4); INSERT 0 3 |
An implicit join is the simplest way to join data. The following example shows an implicit join:
1 2 3 4 5 6 7 8 |
test=# SELECT * FROM a, b WHERE a.id = b.id; id | aid | id | bid ----+-----+----+----- 2 | 2 | 2 | 2 3 | 3 | 3 | 3 (2 rows) |
In this case, all tables are listed in the FROM clause and are later connected in the WHERE clause. In my experience, an implicit join is the most common way to connect two tables. However, my observation might be heavily biased, because an implicit join is the way I tend to write things in my daily work.
Some people prefer the explicit join syntax over implicit joins due to its readability.
The following example shows an explicit join.
1 2 3 4 5 6 7 8 |
test=# SELECT * FROM a JOIN b ON (aid = bid); id | aid | id | bid ----+-----+----+----- 2 | 2 | 2 | 2 3 | 3 | 3 | 3 (2 rows) |
In this case, tables are connected directly using an ON-clause. The ON-clause simply contains the conditions we want to use to join the tables together.
Explicit joins support two types of syntax constructs: ON-clauses and USING-clauses. An ON-clause is perfect in case you want to connect different columns with each other. A using clause is different: It has the same meaning, but it can only be used if the columns on both sides have the same name. Otherwise, a syntax error is issued:
1 2 3 4 5 |
test=# SELECT * FROM a JOIN b USING (aid = bid); ERROR: syntax error at or near '=' LINE 1: SELECT * FROM a JOIN b USING (aid = bid); |
USING is often implemented to connect keys with each other, as shown in the next example:
1 2 3 4 5 6 7 |
test=# SELECT * FROM a JOIN b USING (id); id | aid | bid ----+-----+----- 2 | 2 | 2 3 | 3 | 3 (2 rows) |
In my tables, both column have a column called “id”, which makes it possible to implement USING here. Keep in mind: USING is mostly syntactic sugar – there is no deeper meaning.
Often, an explicit join is used to join more than two tables. To show how that works, I have added another table:
1 2 |
test=# CREATE TABLE c (id int, cid int); CREATE TABLE |
Let's add some data to this table:
1 2 |
test=# INSERT INTO c VALUES (3, 3), (4, 4), (5, 5); INSERT 0 2 |
To perform an explicit join, just add additional JOIN and USING clauses (respectively ON clauses) to the statement.
1 2 3 4 5 6 7 |
test=# SELECT * FROM a INNER JOIN b USING (id) JOIN c USING (id); id | aid | bid | cid ----+-----+-----+----- 3 | 3 | 3 | 3 (1 row) |
The same can be done with an implicit join:
1 2 3 4 5 6 7 8 |
test=# SELECT * FROM a, b, c WHERE a.id = b.id AND b.id = c.id; id | aid | id | bid | id | cid ----+-----+----+-----+----+----- 3 | 3 | 3 | 3 | 3 | 3 (1 row) |
However, as you can see, there is a small difference. Check the number of columns returned by the query. You will notice that the implicit join returns more. The “id” column will show up more frequently in this case, because the implicit join handles the column list in a slightly different way.
The column list is a nasty detail, because in a real application it is always better to explicitly list all columns. This little detail should be kept in mind.
When I am on the road working as PostgreSQL consultant or PostgreSQL support guy, people often ask if there is a performance difference between implicit and explicit joins. The answer is: “Usually not”. Let's take a look at the following statement:
1 2 3 4 5 6 7 8 9 10 11 12 |
test=# explain SELECT * FROM a INNER JOIN b USING (id); QUERY PLAN ----------------------------------------------------------------- Merge Join (cost=317.01..711.38 rows=25538 width=12) Merge Cond: (a.id = b.id) -> Sort (cost=158.51..164.16 rows=2260 width=8) Sort Key: a.id -> Seq Scan on a (cost=0.00..32.60 rows=2260 width=8) -> Sort (cost=158.51..164.16 rows=2260 width=8) Sort Key: b.id -> Seq Scan on b (cost=0.00..32.60 rows=2260 width=8) (8 rows) |
The explicit join produces exactly the same plan as the implicit plan shown below:
1 2 3 4 5 6 7 8 9 10 11 12 |
test=# explain SELECT * FROM a, b WHERE a.id = b.id; QUERY PLAN ----------------------------------------------------------------- Merge Join (cost=317.01..711.38 rows=25538 width=16) Merge Cond: (a.id = b.id) -> Sort (cost=158.51..164.16 rows=2260 width=8) Sort Key: a.id -> Seq Scan on a (cost=0.00..32.60 rows=2260 width=8) -> Sort (cost=158.51..164.16 rows=2260 width=8) Sort Key: b.id -> Seq Scan on b (cost=0.00..32.60 rows=2260 width=8) (8 rows) |
So in the majority of all cases, an implicit join does exactly the same thing as an explicit join.
join_collapse_limit
However, this is not always the case. In PostgreSQL there is a variable called join_collapse_limit
:
1 2 3 4 5 |
test=# SHOW join_collapse_limit; join_collapse_limit --------------------- 8 (1 row) |
What does it all mean? If you prefer explicit over implicit joins, the planner will always plan the first couple of joins automatically – regardless of which join order you have used inside the query. The optimizer will simply reorder joins the way they seem to be most promising. But if you keep adding joins, the ones exceeding join_collapse_limit
will be planned the way you have put them into the query. As you can easily imagine, we are already talking about fairly complicated queries. Joining 9 or more tables is quite a lot and beyond the typical operation in most cases.
from_collapse_limit
There is another parameter called from_collapse_limit
that does the same thing for implicit joins and has the same default value. If a query lists more than from_collapse_limit
tables in its FROM clause, the ones exceeding the limit will not be re-ordered, but joined in the order they appear in the statement.
However, for the typical, “normal” query, the performance and the execution plans stay the same: it makes no difference which type of join you prefer.
If you want to read more about joins, consider reading some of our other blog posts:
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
"join_collpase_limit"
Thanks a lot for explanation.
I would also mention one mnore important parameter, which type is more suitable for developer.
As in my case I'm able naturally use only implicit join and I have to think about realizing explicit join too long.
And I'm pretty much sure that there are many people who are able to use one type much easier and faster than another one.