LATERAL joins are one of the lesser-known features of PostgreSQL and other relational databases such as Oracle, DB2 and MS SQL. However, LATERAL joins are a really useful feature, and it makes sense to take a look at what you can accomplish with them.
Table of Contents
Before we dive into LATERAL, it makes sense to sit back and think about SELECT and FROM clauses in SQL on a more philosophical level. Here is an example:
1 |
SELECT whatever FROM tab; |
Basically, we could see this statement as a loop. Writing this SQL statement in pseudo code would look somewhat like the following snippet:
1 2 3 4 |
for x in tab loop “do whatever” end loop |
For each entry in the table, we do what the SELECT clause says. Usually data is simply returned as it is. A SELECT statement can be seen as a loop. But what if we need a “nested” loop? This is exactly what LATERAL is good for.
Let’s imagine a simple example. Imagine we have a line of products, and we’ve also got customer wishlists. The goal now is to find the best 3 products for each wishlist. The following SQL snippet creates some sample data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
CREATE TABLE t_product AS SELECT id AS product_id, id * 10 * random() AS price, 'product ' || id AS product FROM generate_series(1, 1000) AS id; CREATE TABLE t_wishlist ( wishlist_id int, username text, desired_price numeric ); INSERT INTO t_wishlist VALUES (1, 'hans', '450'), (2, 'joe', '60'), (3, 'jane', '1500') ; |
The product table is populated with 1000 products. The price is random, and we used a pretty creative name to name the products:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
test=# SELECT * FROM t_product LIMIT 10; product_id | price | product ------------+--------------------+------------ 1 | 6.756567642432323 | product 1 2 | 5.284467408540081 | product 2 3 | 28.284196164210904 | product 3 4 | 13.543868035690423 | product 4 5 | 30.576923884383156 | product 5 6 | 26.572431211361902 | product 6 7 | 64.84599396020204 | product 7 8 | 21.550701384168747 | product 8 9 | 28.995584553969174 | product 9 10 | 17.31335004787411 | product 10 (10 rows) |
Next, we have a list of wishes.
1 2 3 4 5 6 7 |
test=# SELECT * FROM t_wishlist; wishlist_id | username | desired_price -------------+----------+--------------- 1 | hans | 450 2 | joe | 60 3 | jane | 1500 (3 rows) |
As you can see, the wishlist belongs to a user and there is a desired price for those three products we want to suggest.
After providing some sample data and loading it into our PostgreSQL database, we can approach the problem and try to come up with a solution.
Suppose we wanted to find the top three products for every wish, in pseudo-code:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
for x in wishlist loop for y in products order by price desc loop found++ if found <= 3 then return row else jump to next wish end end loop end loop |
The important thing is that we need two loops. First, we need to iterate through the list of wishes and then we take a look at the sorted list of products, pick 3 and move on to the next wishlist.
Let’s see how this can be done using a LATERAL-join:
1 2 3 4 5 6 7 8 9 |
SELECT * FROM t_wishlist AS w, LATERAL (SELECT * FROM t_product AS p WHERE p.price < w.desired_price ORDER BY p.price DESC LIMIT 3 ) AS x ORDER BY wishlist_id, price DESC; |
We’ll go through it step by step. The first thing you see in the FROM clause is the t_wishlist table. What LATERAL can do now is to use entries from the wishlist to do its magic. So for each entry in the wishlist, we pick three products. To figure out which products we need, we can make use of w.desired_price. In other words: It is like a “join with parameters”. The FROM-clause is the “outer loop” in our pseudo code and the LATERAL can be seen as the “inner loop”.
The result set looks as follows:
1 2 3 4 5 6 7 8 9 10 11 12 |
wishlist_id | username | desired_price | product_id | price | product -------------+----------+---------------+------------+--------------------+------------- 1 | hans | 450 | 708 | 447.0511375753179 | product 708 1 | hans | 450 | 126 | 443.6560873146138 | product 126 1 | hans | 450 | 655 | 438.0566432022443 | product 655 2 | joe | 60 | 40 | 59.32252841190291 | product 40 2 | joe | 60 | 19 | 59.2142714048882 | product 19 2 | joe | 60 | 87 | 58.78014573804254 | product 87 3 | jane | 1500 | 687 | 1495.8794483743645 | product 687 3 | jane | 1500 | 297 | 1494.4586352980593 | product 297 3 | jane | 1500 | 520 | 1490.7849437550085 | product 520 (9 rows) |
PostgreSQL returned three entries for each wishlist, which is exactly what we wanted. The important part here is that the LIMIT-clause is inside the SELECT fed to LATERAL. Thus it limits the number of rows per wishlist, and not the overall number of rows.
PostgreSQL is doing a pretty good job optimizing LATERAL joins. In our case, the execution plan is going to look pretty straightforward:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
test=# explain SELECT * FROM t_wishlist AS w, LATERAL (SELECT * FROM t_product AS p WHERE p.price < w.desired_price ORDER BY p.price DESC LIMIT 3 ) AS x ORDER BY wishlist_id, price DESC; QUERY PLAN --------------------------------------------------------------------------------------- Sort (cost=23428.53..23434.90 rows=2550 width=91) Sort Key: w.wishlist_id, p.price DESC -> Nested Loop (cost=27.30..23284.24 rows=2550 width=91) -> Seq Scan on t_wishlist w (cost=0.00..18.50 rows=850 width=68) -> Limit (cost=27.30..27.31 rows=3 width=23) -> Sort (cost=27.30..28.14 rows=333 width=23) Sort Key: p.price DESC -> Seq Scan on t_product p (cost=0.00..23.00 rows=333 width=23) Filter: (price < (w.desired_price)::double precision) (9 rows) |
LATERAL joins are extremely useful, and can be utilized in many cases to speed up operations, or to simply make code a lot easier to understand.
If you want to learn more about joins in general and if you want to read more about PostgreSQL right now consider checking out Laurenz Albe’s excellent post about join strategies in PostgreSQL.
If you want to learn more about the PostgreSQL optimizer in general, and if you want to find out more about optimization and other important topics related to PostgreSQL query optimization, check out my blog post about the optimizer.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
nice example and explanation
thanks for the feedback. new ideas are always welcome :).
Great explanation. So it's also very useful when there's no primary key to JOIN on. As I was reading this, I thought, "Why not just JOIN on id and add a WHERE statement after (prob more expensive)?" But there is no primary key shared by the tables. The only relation you have is price. Also, I often hear LATERAL can be done with a CTE but I just tried and could not reproduce the result with a CTE
Winning! thanks for the blog post: clear, concise, lucid.
btw would you know any university courses where one can learn postgres / querying to an expert / advanced level?
Best explanation of LATERAL. Thx
Thanks, great explanation! Lateral join seems to be very dangerous and should only be used as a last-ditch effort. Even in this case which looks to be a very rare problem you are calculating the result in O(n*m) time for an O(n) problem and only works because you have 3 customers instead of say 30000. But it's still much better than cross joins and window functions to solve the same problem.
Is there a declarative way to force postgresql to solve this in O(n)? In pl/SQL I could put together a simple function that solved this in less than a second for a 1000000 x 100000 dataset which was obviously impossible for the lateral join.
A lateral join always forces a nested loop join. That is not dangerous at all, but can perform badly, particularly if both result sets are large. With the top three customers it shouldn't matter, and I don't think there is a better way.
If
n
andm
are the row counts being joined, this always has to beO(n*m)
. "Less than a second" has nothing to do with the big O notation. PL/pgSQL is not particularly fast, and I doubt that a lateral join implemented in PL/pgSQL will beat a lateral join in SQL.I may have completely misunderstood, true. The comment was not very clear.
Anyway, the algorithm you describe is known as "nested loops":
No, it's not. Please take your time to understand it.
I edited my comment above and added my SQLs.