CYBERTEC PostgreSQL Logo

Time in PostgreSQL: Outer joins

08.2013 / Category: / Tags:

UPDATED AUGUST 2023: After my previous post, I received mails asking about what was meant by using “generate_series” in an outer join.

Let me try to address this issue with an example: Many people use PostgreSQL to aggregate data. In this example, we assume that we want to aggregate data on an hourly basis.

Sample data 

Here's some sample data:

We create a table with just two fields:

We have data for 3am, 4am and 6am ... we've intentionally left 5am out. If we want to start to aggregate the data to see how many rows there are per hour, we can use the following query:

The SQL challenge

A simple GROUP BY will reveal that there were no entries for the 5am group. However, the zero is not displayed here, because GROUP BY will only return data for valid groups.

In order to create a list of times we want to see in our final report, we can execute generate_series. Let's assume we want to have a complete table from midnight to 6am:

This will return a complete list – exactly what we need to display in the final results. Remember, we also want to display hours with no rows.

Joining things together with an outer join

Let's now join things together using a standard PostgreSQL outer join:

We'll run date_trunc to make sure that the precision of our dates is actually in hours. Then we can use the date column as the join key. Note that the non-existing fields on the “data” side of the join will stay empty.

Watch out for trouble – be precise

Once we have joined the data we have to aggregate to fetch the final result. When doing this aggregation, watch out for a common mistake:

Never EVER use count(*) along with an outer join.

The problem is, count(*) will count ROWS – we don't want to count rows because we're not allowed to count NULL values. As you can see, the result is just plain wrong.

To correct the problem, I advise you to count non-NULL values. In SQL, this means that you have to use count(column) rather than count(*). count(column) will only count values which are not NULL. That's exactly what we want:

Voilà, the result is exactly as desired.


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

Comments are closed.

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram