CYBERTEC PostgreSQL Logo

PostgreSQL: Implicit vs. explicit joins

11.2018 / Category: , / Tags:

If you happen to be an SQL developer, you will know that joins are really at the core of the language. Joins come in various flavors: Inner joins, left joins, full joins, natural joins, self joins, semi-joins, lateral joins, and so on. However, one of the most important distinctions is the difference between implicit and explicit joins. Over the years, flame wars have been fought over this issue. Still, not many people know what's really going on. I hope this post helps to shed some light on the situation.

graphical representation of joins of tables in a PostgreSQL database

Using implicit joins

Before I dig into practical examples, let's create some tables that we can later use to perform our joins:

In the next step, some rows are added to the tables:

An implicit join is the simplest way to join data. The following example shows an implicit join:

In this case, all tables are listed in the FROM clause and are later connected in the WHERE clause. In my experience, an implicit join is the most common way to connect two tables. However, my observation might be heavily biased, because an implicit join is the way I tend to write things in my daily work.

Using explicit joins

Some people prefer the explicit join syntax over implicit joins due to its readability.

The following example shows an explicit join.

In this case, tables are connected directly using an ON-clause. The ON-clause simply contains the conditions we want to use to join the tables together.

Explicit joins support two types of syntax constructs: ON-clauses and USING-clauses. An ON-clause is perfect in case you want to connect different columns with each other. A using clause is different: It has the same meaning, but it can only be used if the columns on both sides have the same name. Otherwise, a syntax error is issued:

USING is often implemented to connect keys with each other, as shown in the next example:

In my tables, both column have a column called “id”, which makes it possible to implement USING here. Keep in mind: USING is mostly syntactic sugar – there is no deeper meaning.

Often, an explicit join is used to join more than two tables. To show how that works, I have added another table:

Let's add some data to this table:

To perform an explicit join, just add additional JOIN and USING clauses (respectively ON clauses) to the statement.

Here's an example of an explicit join:

The same can be done with an implicit join:

However, as you can see, there is a small difference. Check the number of columns returned by the query. You will notice that the implicit join returns more. The “id” column will show up more frequently in this case, because the implicit join handles the column list in a slightly different way.

The column list is a nasty detail, because in a real application it is always better to explicitly list all columns. This little detail should be kept in mind.

join_collapse_limit: What the optimizer does

When I am on the road working as PostgreSQL consultant or PostgreSQL support guy, people often ask if there is a performance difference between implicit and explicit joins. The answer is: “Usually not”. Let's take a look at the following statement:

The explicit join produces exactly the same plan as the implicit plan shown below:

So in the majority of all cases, an implicit join does exactly the same thing as an explicit join.

join_collapse_limit

However, this is not always the case. In PostgreSQL there is a variable called join_collapse_limit:

What does it all mean? If you prefer explicit over implicit joins, the planner will always plan the first couple of joins automatically – regardless of which join order you have used inside the query. The optimizer will simply reorder joins the way they seem to be most promising. But if you keep adding joins, the ones exceeding join_collapse_limit will be planned the way you have put them into the query. As you can easily imagine, we are already talking about fairly complicated queries. Joining 9 or more tables is quite a lot and beyond the typical operation in most cases.

from_collapse_limit

There is another parameter called from_collapse_limit that does the same thing for implicit joins and has the same default value. If a query lists more than from_collapse_limit tables in its FROM clause, the ones exceeding the limit will not be re-ordered, but joined in the order they appear in the statement.

However, for the typical, “normal” query, the performance and the execution plans stay the same: it makes no difference which type of join you prefer.

If you want to read more about joins, consider reading some of our other blog posts:

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

2 responses to “PostgreSQL: Implicit vs. explicit joins”

  1. Thanks a lot for explanation.
    I would also mention one mnore important parameter, which type is more suitable for developer.
    As in my case I'm able naturally use only implicit join and I have to think about realizing explicit join too long.
    And I'm pretty much sure that there are many people who are able to use one type much easier and faster than another one.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram