How the PostgreSQL query optimizer works

Just like any advanced relational database, PostgreSQL uses a cost-based query optimizer that tries to turn your SQL queries into something efficient that executes in as little time as possible. For many people, the workings of the optimizer itself remain a mystery, so we have decided to give users some insight into what is really going on behind the scenes.

Table of Contents

So let’s take a tour through the PostgreSQL optimizer and get an overview of some of the most important techniques the optimizer uses to speed up queries. Note that the techniques listed here are in no way complete. There is a lot more going on, but it makes sense to take a look at the most basic things in order to gain a good understanding of the process.

PostgreSQL constant folding

Constant folding is one of the easier processes to understand. Nonetheless, it's extremely important.

Let’s see what happens during the constant folding process:

demo=# SELECT *
           FROM generate_series(1, 10) AS x
           WHERE x = 7 + 1;
 x
---
 8
(1 row)

demo=# explain SELECT *
           FROM generate_series(1, 10) AS x
           WHERE x = 7 + 1;
                            QUERY PLAN
----------------------------------------------------------------------
 Function Scan on generate_series x (cost=0.00..0.13 rows=1 width=4)
    Filter: (x = 8)
(2 rows)

demo=# SELECT *

FROM generate_series(1, 10) AS x

WHERE x = 7 + 1;

---

(1 row)

demo=# explain SELECT *

FROM generate_series(1, 10) AS x

WHERE x = 7 + 1;

QUERY PLAN

----------------------------------------------------------------------

Function Scan on generate_series x (cost=0.00..0.13 rows=1 width=4)

Filter: (x = 8)

(2 rows)

Here we add a filter to the query: x = 7 + 1. What the system does is to “fold” the constant and instead do “x = 8”. Why is that important? In case “x” is indexed (assuming it is a table), we can easily look up 8 in the index.

Note the following:

demo=# explain SELECT *
           FROM  generate_series(1, 10) AS x
           WHERE x - 1 = 7 ;
                            QUERY PLAN
----------------------------------------------------------------------
 Function Scan on generate_series x (cost=0.00..0.15 rows=1 width=4)
    Filter: ((x - 1) = 7)
(2 rows)

demo=# explain SELECT *

FROM generate_series(1, 10) AS x

WHERE x - 1 = 7 ;

QUERY PLAN

----------------------------------------------------------------------

Function Scan on generate_series x (cost=0.00..0.15 rows=1 width=4)

Filter: ((x - 1) = 7)

(2 rows)

PostgreSQL does not transform the expression to “x = 8” in this case. That’s why you should try to make sure that the filter is on the right side, and not on the column you might want to index.

PostgreSQL query optimizer: function inlining

One more important technique is the idea of function inlining. The goal is to reduce function calls as much as possible and thus speed up the query.

Let’s create a function to calculate a logarithm:

demo=# CREATE OR REPLACE FUNCTION ld(int)
           RETURNS numeric AS
$
      SELECT log(2, $1);
$ LANGUAGE 'sql';
CREATE FUNCTION

demo=# SELECT ld(1024);
         ld
---------------------
 10.0000000000000000
(1 row)

demo=# CREATE OR REPLACE FUNCTION ld(int)

RETURNS numeric AS

SELECT log(2, $1);

$ LANGUAGE 'sql';

CREATE FUNCTION

demo=# SELECT ld(1024);

---------------------

10.0000000000000000

(1 row)

2^10 = 1024. This looks right.

Now, let’s see what happens in a real query that uses function inlining:

demo=# explain SELECT *
           FROM generate_series(1, 10) AS x
           WHERE ld(x) = 1000;
                            QUERY PLAN
----------------------------------------------------------------------
Function Scan on generate_series x (cost=0.00..0.18 rows=1 width=4)
   Filter: (log('2'::numeric, (x)::numeric) = '1000'::numeric)
(2 rows)

demo=# explain SELECT *

FROM generate_series(1, 10) AS x

WHERE ld(x) = 1000;

QUERY PLAN

----------------------------------------------------------------------

Function Scan on generate_series x (cost=0.00..0.18 rows=1 width=4)

Filter: (log('2'::numeric, (x)::numeric) = '1000'::numeric)

(2 rows)

Look in the WHERE clause. The ld function has been replaced with the underlying log function. Note that this is only possible in the case of SQL functions. PL/pgSQL and other stored procedure languages are black boxes to the optimizer, so whether these things are possible or not depends on the type of language used.

Here is an example using PL/pgSQL:

demo=# CREATE OR REPLACE FUNCTION pl_ld(int)
           RETURNS numeric AS
$
     BEGIN
                 RETURN log(2, $1);
     END;
$ LANGUAGE 'plpgsql';
CREATE FUNCTION
demo=# explain SELECT *
           FROM generate_series(1, 10) AS x
           WHERE pl_ld(x) = 1000;
                            QUERY PLAN
----------------------------------------------------------------------
  Function Scan on generate_series x (cost=0.00..2.63 rows=1 width=4)
   Filter: (pl_ld(x) = '1000'::numeric)
(2 rows)

demo=# CREATE OR REPLACE FUNCTION pl_ld(int)

RETURNS numeric AS

BEGIN

RETURN log(2, $1);

END;

$ LANGUAGE 'plpgsql';

CREATE FUNCTION

demo=# explain SELECT *

FROM generate_series(1, 10) AS x

WHERE pl_ld(x) = 1000;

QUERY PLAN

----------------------------------------------------------------------

Function Scan on generate_series x (cost=0.00..2.63 rows=1 width=4)

Filter: (pl_ld(x) = '1000'::numeric)

(2 rows)

In this case, inlining is not possible. While the code is basically the same, the programming language does make a major difference.

Function stability: VOLATILE vs. STABLE vs. IMMUTABLE

Something that is often overlooked is the concept of function stability. When creating a function, it makes a difference if a function is created as VOLATILE (default), STABLE, or as IMMUTABLE. It can even make a major difference - especially if you are using indexes. Let’s create some sample data and sort these differences out:

VOLATILE means that a function is not guaranteed to return the same result within the same transaction given the same input parameters. In other words, the PostgreSQL optimizer cannot see the function as a constant, and has to execute it for every row.

Here's an example where the optimizer has to execute the function for every row:

demo=# CREATE TABLE t_date AS
           SELECT *
           FROM   generate_series('1900-01-01'::timestamptz,
                    '2021-12-31'::timestamptz, '1 minute') AS x;
SELECT 64164961
demo=# CREATE INDEX idx_date ON t_date (x);
CREATE INDEX

demo=# CREATE TABLE t_date AS

SELECT *

FROM generate_series('1900-01-01'::timestamptz,

'2021-12-31'::timestamptz, '1 minute') AS x;

SELECT 64164961

demo=# CREATE INDEX idx_date ON t_date (x);

CREATE INDEX

We have generated a list of 64 million entries containing 1 row per minute since January 1900, which produces 64 million entries.

Let’s run the query using a `VOLATILE` function:

demo=# explain analyze SELECT *
           FROM t_date
           WHERE x = clock_timestamp();
                          QUERY PLAN 
-------------------------------------------------------------------
 Gather (cost=1000.00..685947.45 rows=1 width=8)
        (actual time=2656.961..2658.547 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  -> Parallel Seq Scan on t_date
         (cost=0.00..684947.35 rows=1 width=8)
         (actual time=2653.009..2653.009 rows=0 loops=3)
     Filter: (x = clock_timestamp())
     Rows Removed by Filter: 21388320
 Planning Time: 0.056 ms
 Execution Time: 2658.562 ms
(8 rows)

demo=# explain analyze SELECT *

FROM t_date

WHERE x = clock_timestamp();

QUERY PLAN

-------------------------------------------------------------------

Gather (cost=1000.00..685947.45 rows=1 width=8)

(actual time=2656.961..2658.547 rows=0 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Parallel Seq Scan on t_date

(cost=0.00..684947.35 rows=1 width=8)

(actual time=2653.009..2653.009 rows=0 loops=3)

Filter: (x = clock_timestamp())

Rows Removed by Filter: 21388320

Planning Time: 0.056 ms

Execution Time: 2658.562 ms

(8 rows)

In this case, the query needs a whopping 2.6 seconds and eats up a ton of resources. The reason is that clock_timestamp() is VOLATILE.

Now, what if we try to do the same thing using a `STABLE` function?

demo=# explain analyze SELECT *
           FROM t_date
           WHERE x = now();
                        QUERY PLAN 
---------------------------------------------------------------
 Index Only Scan using idx_date on t_date
      (cost=0.57..4.59 rows=1 width=8)
      (actual time=0.014..0.015 rows=0 loops=1)
    Index Cond: (x = now())
    Heap Fetches: 0
 Planning Time: 0.060 ms
 Execution Time: 0.026 ms
(5 rows)

demo=# explain analyze SELECT *

FROM t_date

WHERE x = now();

QUERY PLAN

---------------------------------------------------------------

Index Only Scan using idx_date on t_date

(cost=0.57..4.59 rows=1 width=8)

(actual time=0.014..0.015 rows=0 loops=1)

Index Cond: (x = now())

Heap Fetches: 0

Planning Time: 0.060 ms

Execution Time: 0.026 ms

(5 rows)

The query is many thousands of times faster, because now PostgreSQL can turn it into a constant and thus use the index. If you want to learn more about function stability in PostgreSQL, here is more information.

Equality constraints

The next optimization on our list is the concept of equality constraints. What PostgreSQL tries to do here is to derive implicit knowledge about the query.

First, let’s create some sample data for our equality constraints:

demo=# CREATE TABLE t_demo AS
           SELECT x, x AS y
           FROM generate_series(1, 1000) AS x;
SELECT 1000
demo=# SELECT * FROM t_demo LIMIT 5;
x  | y
---+---
1  | 1
2  | 2
3  | 3
4  | 4
5  | 5
(5 rows)

demo=# CREATE TABLE t_demo AS

SELECT x, x AS y

FROM generate_series(1, 1000) AS x;

SELECT 1000

demo=# SELECT * FROM t_demo LIMIT 5;

x | y

---+---

1 | 1

2 | 2

3 | 3

4 | 4

5 | 5

(5 rows)

What we have here are 1000 rows. We’ll run a simple query:

demo=# explain SELECT *
               FROM t_demo
               WHERE x = y
                     AND y = 4;
                   QUERY PLAN
-------------------------------------------------------
 Seq Scan on t_demo (cost=0.00..20.00 rows=1 width=8)
   Filter: ((x = 4) AND (y = 4))
(2 rows)

demo=# explain SELECT *

FROM t_demo

WHERE x = y

AND y = 4;

QUERY PLAN

-------------------------------------------------------

Seq Scan on t_demo (cost=0.00..20.00 rows=1 width=8)

Filter: ((x = 4) AND (y = 4))

(2 rows)

Again, the magic is in the execution plan. You can see that PostgreSQL has figured that x and y are 4.

That opens the door for important optimizations:

demo=# CREATE INDEX idx_x ON t_demo (x);
CREATE INDEX
demo=# explain SELECT *
                FROM t_demo
                WHERE x = y
                      AND y = 4;
                           QUERY PLAN
--------------------------------------------------------------------
 Index Scan using idx_x on t_demo (cost=0.28..8.29 rows=1 width=8)
    Index Cond: (x = 4)
    Filter: (y = 4)
(3 rows)

demo=# CREATE INDEX idx_x ON t_demo (x);

CREATE INDEX

demo=# explain SELECT *

FROM t_demo

WHERE x = y

AND y = 4;

QUERY PLAN

--------------------------------------------------------------------

Index Scan using idx_x on t_demo (cost=0.28..8.29 rows=1 width=8)

Index Cond: (x = 4)

Filter: (y = 4)

(3 rows)

Without this optimization, it would be absolutely impossible to use the index we have just created. In order to optimize the query, PostgreSQL will automatically figure out that we can use an index here.

Indexes are important!

View inlining and subselect flattening

When talking about the PostgreSQL optimizer and query optimization there is no way to ignore views and subselect handling.

Let’s create a view and query it:

demo=# CREATE VIEW v1 AS
           SELECT *
           FROM   generate_series(1, 10) AS x;
CREATE VIEW
demo=# explain SELECT *
           FROM v1
           ORDER BY x DESC;
                QUERY PLAN
----------------------------------------------
 Sort (cost=0.27..0.29 rows=10 width=4)
    Sort Key: x.x DESC
    -> Function Scan on generate_series x
       (cost=0.00..0.10 rows=10 width=4)
(3 rows)

demo=# CREATE VIEW v1 AS

SELECT *

FROM generate_series(1, 10) AS x;

CREATE VIEW

demo=# explain SELECT *

FROM v1

ORDER BY x DESC;

QUERY PLAN

----------------------------------------------

Sort (cost=0.27..0.29 rows=10 width=4)

Sort Key: x.x DESC

-> Function Scan on generate_series x

(cost=0.00..0.10 rows=10 width=4)

(3 rows)

When we look at the execution plan, the view is nowhere to be seen.

The reason is that PostgreSQL has inlined the view as a subselect and flattened it out. How does that work?

SELECT *
          FROM v1
          ORDER BY x DESC;

SELECT *

FROM v1

ORDER BY x DESC;

This query is turned into …

SELECT *
          FROM (SELECT *
                FROM generate_series(1, 10) AS x
               ) AS v1
         ORDER BY x DESC;

SELECT *

FROM (SELECT *

FROM generate_series(1, 10) AS x

) AS v1

ORDER BY x DESC;

Then the subselect is flattened out which leaves us with …

SELECT *
FROM   generate_series(1, 10) AS x
ORDER BY x DESC;

SELECT *

FROM generate_series(1, 10) AS x

ORDER BY x DESC;

There is a parameter in the PostgreSQL query optimizer, `from_collapse_limit`, which controls this behavior:

demo=# SHOW from_collapse_limit;
 from_collapse_limit
---------------------
 8
(1 row)

demo=# SHOW from_collapse_limit;

from_collapse_limit

---------------------

(1 row)

The meaning of this parameter is that only up to 8 subselects in the FROM clause will be flattened out. If there are more than 8 subselects, they will be executed without being flattened out. In most real-world use cases, this is not a problem. It can only become an issue if the SQL statements used are very complex. More information about joins and join_collapse_limit can be found in our blog.

Keep in mind that inlining is not always possible. Developers are aware of that.

Optimizing joins in PostgreSQL

Joins are used in most queries and are therefore of incredible importance to good performance. We’ll now focus on some of the techniques relevant to joins in general.

Optimizing join orders

The next important thing on our list is the way the PostgreSQL optimizer handles join orders. In a PostgreSQL database, joins are not necessarily done in the order proposed by the end user - quite the opposite: The query optimizer tries to figure out as many join options as possible.

Let’s create some sample data and figure out how the optimizer works when it comes to join orders:

demo=# CREATE TABLE a AS
     SELECT x, x % 10 AS y
     FROM generate_series(1, 100000) AS x
     ORDER BY random();
SELECT 100000
demo=# CREATE TABLE b AS
     SELECT x, x % 10 AS y
     FROM generate_series(1, 1000000) AS x
     ORDER BY random();
SELECT 1000000
demo=# CREATE TABLE c AS
     SELECT x, x % 10 AS y
     FROM generate_series(1, 10000000) AS x
     ORDER BY random();
SELECT 10000000

demo=# CREATE TABLE a AS

SELECT x, x % 10 AS y

FROM generate_series(1, 100000) AS x

ORDER BY random();

SELECT 100000

demo=# CREATE TABLE b AS

SELECT x, x % 10 AS y

FROM generate_series(1, 1000000) AS x

ORDER BY random();

SELECT 1000000

demo=# CREATE TABLE c AS

SELECT x, x % 10 AS y

FROM generate_series(1, 10000000) AS x

ORDER BY random();

SELECT 10000000

In the next steps, a couple of indexes are created:

demo=# CREATE INDEX a_x ON a(x);
CREATE INDEX
demo=# CREATE INDEX b_x ON b(x);
CREATE INDEX
demo=# CREATE INDEX c_x ON c(x);
CREATE INDEX
demo=# ANALYZE;
ANALYZE

demo=# CREATE INDEX a_x ON a(x);

CREATE INDEX

demo=# CREATE INDEX b_x ON b(x);

CREATE INDEX

demo=# CREATE INDEX c_x ON c(x);

CREATE INDEX

demo=# ANALYZE;

ANALYZE

We now have three tables. We’ll query them and see what happens:

demo=# explain SELECT *
               FROM  a, b, c
               WHERE c.x = a.x
                     AND a.x = b.x
                     AND a.x = 10;
                        QUERY PLAN
-------------------------------------------------------------------- 
Nested Loop (cost=1.15..25.23 rows=1 width=24)
  -> Nested Loop (cost=0.72..16.76 rows=1 width=16)
    -> Index Scan using a_x on a
         (cost=0.29..8.31 rows=1 width=8)
         Index Cond: (x = 10)
    -> Index Scan using b_x on b
         (cost=0.42..8.44 rows=1 width=8)
         Index Cond: (x = 10)
    -> Index Scan using c_x on c
         (cost=0.43..8.45 rows=1 width=8)
     Index Cond: (x = 10)
(8 rows)

demo=# explain SELECT *

FROM a, b, c

WHERE c.x = a.x

AND a.x = b.x

AND a.x = 10;

QUERY PLAN

--------------------------------------------------------------------

Nested Loop (cost=1.15..25.23 rows=1 width=24)

-> Nested Loop (cost=0.72..16.76 rows=1 width=16)

-> Index Scan using a_x on a

(cost=0.29..8.31 rows=1 width=8)

Index Cond: (x = 10)

-> Index Scan using b_x on b

(cost=0.42..8.44 rows=1 width=8)

Index Cond: (x = 10)

-> Index Scan using c_x on c

(cost=0.43..8.45 rows=1 width=8)

Index Cond: (x = 10)

(8 rows)

Note that the query joins “c and a” and then “a and b”. However, let’s look at the plan more closely. PostgreSQL starts with index scans on a and b. The result is then joined with c. Three indexes are used. This happens because of the equality constraints we discussed before. To find out about forcing the join order, see this blog.

Implicit vs. explicit joins in PostgreSQL

Many people keep asking about explicit versus implicit joins. Basically, both variants are the same.

Let’s check out two queries, one with an explicit and one with an implicit join:

SELECT * FROM a, b WHERE a.id = b.id;
vs.
SELECT * FROM a JOIN b ON a.id = b.id;

SELECT * FROM a, b WHERE a.id = b.id;

vs.

SELECT * FROM a JOIN b ON a.id = b.id;

Both queries are identical and the planner will treat them the same way for most commonly seen queries. Mind that the explicit joins work with and without parenthesis.

However, there is one parameter that is of great importance here, `join_collapse_limit`:

demo=# SHOW join_collapse_limit;
 join_collapse_limit
---------------------
 8
(1 row)

demo=# SHOW join_collapse_limit;

join_collapse_limit

---------------------

(1 row)

The join_collapse_limit parameter controls how many explicit joins are planned implicitly. In other words, an implicit join is just like an explicit join, but only up to a certain number of joins controlled by this parameter. See this blog for more information. It is also possible to use join_collapse_limit to force the join order, as explained in this blog.

For the sake of simplicity, we can assume that it makes no difference for 95% of all queries and for most customers.

Determine the join strategy

PostgreSQL offers various join strategies. These strategies include hash joins, merge joins, nested loops, and a lot more. We have already shared some of this information in previous posts. More on PostgreSQL join strategies can be found here.

Optimizing outer joins (LEFT JOIN, etc.)

Optimizing outer joins (LEFT JOIN, RIGHT JOIN, etc.) is an important topic. Usually, the planner has fewer options here than in the case of inner joins. The following optimizations are possible:

(A leftjoin B on (Pab)) innerjoin C on (Pac)
     = (A innerjoin C on (Pac)) leftjoin B on (Pab)

1 2	(A leftjoin B on (Pab)) innerjoin C on (Pac) = (A innerjoin C on (Pac)) leftjoin B on (Pab)

where Pac is a predicate referencing A and C, etc (in this case, clearly
Pac cannot reference B, or the transformation is nonsensical).

(A leftjoin B on (Pab)) leftjoin C on (Pac)
     = (A leftjoin C on (Pac)) leftjoin B on (Pab)

(A leftjoin B on (Pab)) leftjoin C on (Pbc)
     = A leftjoin (B leftjoin C on (Pbc)) on (Pab)

(A leftjoin B on (Pab)) leftjoin C on (Pac)

= (A leftjoin C on (Pac)) leftjoin B on (Pab)

(A leftjoin B on (Pab)) leftjoin C on (Pbc)

= A leftjoin (B leftjoin C on (Pbc)) on (Pab)

While this theoretical explanation is correct, most people will have no clue what it means in real life.

Therefore I have compiled a real-world example showing how PostgreSQL actually reorders a real join:

demo=# explain SELECT *
      FROM generate_series(1, 10) AS x
                 LEFT JOIN generate_series(1, 100) AS y
                       ON (x = y)
                 JOIN generate_series(1, 10000) AS z
                       ON (y = z)
;
                           QUERY PLAN 
-------------------------------------------------------------------- 
Hash Join (cost=1.83..144.33 rows=500 width=12)
    Hash Cond: (z.z = x.x)
    -> Function Scan on generate_series z
          (cost=0.00..100.00 rows=10000 width=4)
    -> Hash (cost=1.71..1.71 rows=10 width=8)
      -> Hash Join (cost=0.23..1.71 rows=10 width=8)
           Hash Cond: (y.y = x.x)
           -> Function Scan on generate_series y
                (cost=0.00..1.00 rows=100 width=4)
           -> Hash (cost=0.10..0.10 rows=10 width=4)
                 -> Function Scan on generate_series x
                      (cost=0.00..0.10 rows=10 width=4)
(9 rows)

demo=# explain SELECT *

FROM generate_series(1, 10) AS x

LEFT JOIN generate_series(1, 100) AS y

ON (x = y)

JOIN generate_series(1, 10000) AS z

ON (y = z)

;

QUERY PLAN

--------------------------------------------------------------------

Hash Join (cost=1.83..144.33 rows=500 width=12)

Hash Cond: (z.z = x.x)

-> Function Scan on generate_series z

(cost=0.00..100.00 rows=10000 width=4)

-> Hash (cost=1.71..1.71 rows=10 width=8)

-> Hash Join (cost=0.23..1.71 rows=10 width=8)

Hash Cond: (y.y = x.x)

-> Function Scan on generate_series y

(cost=0.00..1.00 rows=100 width=4)

-> Hash (cost=0.10..0.10 rows=10 width=4)

-> Function Scan on generate_series x

(cost=0.00..0.10 rows=10 width=4)

(9 rows)

What we see here is that the PostgreSQL optimizer decides on joining x with y and then with z. In other words, the PostgreSQL optimizer has simply followed the join order as used in the SQL statement.

But what happens if we decide to tweak the parameters a bit?

demo=# explain SELECT *
     FROM generate_series(1, 10) AS x
                LEFT JOIN generate_series(1, 100000) AS y
                      ON (x = y)
                JOIN generate_series(1, 100) AS z
                      ON (y = z)
;
                           QUERY PLAN 
-------------------------------------------------------------------- 
Hash Join (cost=1.83..1426.83 rows=5000 width=12)
   Hash Cond: (y.y = x.x)
   -> Function Scan on generate_series y
          (cost=0.00..1000.00 rows=100000 width=4)
   -> Hash (cost=1.71..1.71 rows=10 width=8)
     -> Hash Join (cost=0.23..1.71 rows=10 width=8)
               Hash Cond: (z.z = x.x)
               -> Function Scan on generate_series z
                    (cost=0.00..1.00 rows=100 width=4)
               -> Hash (cost=0.10..0.10 rows=10 width=4)
                    -> Function Scan on generate_series x
                         (cost=0.00..0.10 rows=10 width=4)
(9 rows)

demo=# explain SELECT *

FROM generate_series(1, 10) AS x

LEFT JOIN generate_series(1, 100000) AS y

ON (x = y)

JOIN generate_series(1, 100) AS z

ON (y = z)

;

QUERY PLAN

--------------------------------------------------------------------

Hash Join (cost=1.83..1426.83 rows=5000 width=12)

Hash Cond: (y.y = x.x)

-> Function Scan on generate_series y

(cost=0.00..1000.00 rows=100000 width=4)

-> Hash (cost=1.71..1.71 rows=10 width=8)

-> Hash Join (cost=0.23..1.71 rows=10 width=8)

Hash Cond: (z.z = x.x)

-> Function Scan on generate_series z

(cost=0.00..1.00 rows=100 width=4)

-> Hash (cost=0.10..0.10 rows=10 width=4)

-> Function Scan on generate_series x

(cost=0.00..0.10 rows=10 width=4)

(9 rows)

This is the same query, but with slightly altered parameters.

The difference is that PostgreSQL again starts with x but then joins z first, before adding y.

Note that this optimization happens automatically. One reason why the optimizer can make this decision is because of the existence of optimizer support functions which were added to PostgreSQL a while ago. The reason why the reordering works is that support functions offer the planner a chance to figure out how many rows are returned from which part. If you use tables instead of set returning functions, support functions are irrelevant. PostgreSQL v16 has added support for "anti-joins" in RIGHT and OUTER queries.

Automatic join pruning in PostgreSQL

Not every join in a query is actually executed by PostgreSQL. The optimizer knows the concept of join pruning and is able to get rid of pointless joins quite efficiently. The main question is: When is that possible, and how can we figure out what’s going on?

The next listing shows how some suitable sample data can be created:

demo=# CREATE TABLE t_left AS SELECT *
           FROM generate_series(1, 1000) AS id;
SELECT 1000
demo=# CREATE UNIQUE INDEX idx_left ON t_left (id);
CREATE INDEX
demo=# CREATE TABLE t_right AS SELECT *
           FROM generate_series(1, 100) AS id;
SELECT 100
demo=# CREATE UNIQUE INDEX idx_right ON t_right (id);
CREATE INDEX

demo=# CREATE TABLE t_left AS SELECT *

FROM generate_series(1, 1000) AS id;

SELECT 1000

demo=# CREATE UNIQUE INDEX idx_left ON t_left (id);

CREATE INDEX

demo=# CREATE TABLE t_right AS SELECT *

FROM generate_series(1, 100) AS id;

SELECT 100

demo=# CREATE UNIQUE INDEX idx_right ON t_right (id);

CREATE INDEX

In this case, we need to make sure that both sides actually have primary keys, or some kind of unique constraint:

demo=# d t_left
Table 'public.t_left'
Column  | Type    | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id      | integer |           |          |
Indexes:
     'idx_left' UNIQUE, btree (id)

demo=# d t_right
Table 'public.t_right'
Column  | Type    | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id      | integer |           |          |
Indexes:
     'idx_right' UNIQUE, btree (id)

demo=# d t_left

Table 'public.t_left'

Column | Type | Collation | Nullable | Default

--------+---------+-----------+----------+---------

id | integer | | |

Indexes:

'idx_left' UNIQUE, btree (id)

demo=# d t_right

Table 'public.t_right'

Column | Type | Collation | Nullable | Default

--------+---------+-----------+----------+---------

id | integer | | |

Indexes:

'idx_right' UNIQUE, btree (id)

To show how the PostgreSQL query planner handles join pruning, we’ll take a look at two different SQL statements:

demo=# explain SELECT *
           FROM t_left AS a LEFT JOIN t_right AS b
                ON (a.id = b.id);
                          QUERY PLAN
--------------------------------------------------------------------
Hash Left Join (cost=3.25..20.89 rows=1000 width=8)
  Hash Cond: (a.id = b.id)
  -> Seq Scan on t_left a (cost=0.00..15.00 rows=1000 width=4)
  -> Hash (cost=2.00..2.00 rows=100 width=4)
    -> Seq Scan on t_right b (cost=0.00..2.00 rows=100 width=4)
(5 rows)

demo=# explain SELECT *

FROM t_left AS a LEFT JOIN t_right AS b

ON (a.id = b.id);

QUERY PLAN

--------------------------------------------------------------------

Hash Left Join (cost=3.25..20.89 rows=1000 width=8)

Hash Cond: (a.id = b.id)

-> Seq Scan on t_left a (cost=0.00..15.00 rows=1000 width=4)

-> Hash (cost=2.00..2.00 rows=100 width=4)

-> Seq Scan on t_right b (cost=0.00..2.00 rows=100 width=4)

(5 rows)

In this case, the join has to be executed. As you can see, PostgreSQL has decided on a hash join.

The next example contains only a small variation of the query:

demo=# explain SELECT a.*
           FROM t_left AS a LEFT JOIN t_right AS b
                ON (a.id = b.id);
                      QUERY PLAN
------------------------------------------------------------
 Seq Scan on t_left a (cost=0.00..15.00 rows=1000 width=4)
(1 row)

demo=# explain SELECT a.*

FROM t_left AS a LEFT JOIN t_right AS b

ON (a.id = b.id);

QUERY PLAN

------------------------------------------------------------

Seq Scan on t_left a (cost=0.00..15.00 rows=1000 width=4)

(1 row)

Join pruning can happen if we DO NOT read data from the right side, and if the right side is unique. If the right side is not unique, the join might actually increase the number of rows returned; so pruning is only possible in case the right side is unique.

Let’s try out join pruning:

demo=# DROP INDEX idx_right ;
DROP INDEX

demo=# explain SELECT a.*
           FROM t_left AS a LEFT JOIN t_right AS b
                ON (a.id = b.id);
                         QUERY PLAN
-------------------------------------------------------------------- 
Hash Left Join (cost=3.25..23.00 rows=1000 width=4)
   Hash Cond: (a.id = b.id)
   -> Seq Scan on t_left a (cost=0.00..15.00 rows=1000 width=4)
   -> Hash (cost=2.00..2.00 rows=100 width=4)
     -> Seq Scan on t_right b (cost=0.00..2.00 rows=100 width=4)
(5 rows)

demo=# DROP INDEX idx_right ;

DROP INDEX

demo=# explain SELECT a.*

FROM t_left AS a LEFT JOIN t_right AS b

ON (a.id = b.id);

QUERY PLAN

--------------------------------------------------------------------

Hash Left Join (cost=3.25..23.00 rows=1000 width=4)

Hash Cond: (a.id = b.id)

-> Seq Scan on t_left a (cost=0.00..15.00 rows=1000 width=4)

-> Hash (cost=2.00..2.00 rows=100 width=4)

-> Seq Scan on t_right b (cost=0.00..2.00 rows=100 width=4)

(5 rows)

While it is certainly a good thing to have join pruning in the PostgreSQL optimizer, you have to be aware of the fact that the planner is basically fixing something which should not exist anyway. Write queries efficiently in the first place; don’t add pointless joins.

EXISTS and anti-joins

There is also something common you will see everywhere in the code within SQL EXISTS. Here’s an example:

demo=# explain SELECT * FROM a
           WHERE NOT EXISTS (SELECT 1 FROM b
                 WHERE a.x = b.x
                 AND b.x = 42);
                            QUERY PLAN
-------------------------------------------------------------------
Hash Anti Join (cost=4.46..2709.95 rows=100000 width=8)
 Hash Cond: (a.x = b.x)
 -> Seq Scan on a (cost=0.00..1443.00 rows=100000 width=8)
 -> Hash (cost=4.44..4.44 rows=1 width=4)
      -> Index Only Scan using b_x on b
         (cost=0.42..4.44 rows=1 width=4)
            Index Cond: (x = 42)
(6 rows)

demo=# explain SELECT * FROM a

WHERE NOT EXISTS (SELECT 1 FROM b

WHERE a.x = b.x

AND b.x = 42);

QUERY PLAN

-------------------------------------------------------------------

Hash Anti Join (cost=4.46..2709.95 rows=100000 width=8)

Hash Cond: (a.x = b.x)

-> Seq Scan on a (cost=0.00..1443.00 rows=100000 width=8)

-> Hash (cost=4.44..4.44 rows=1 width=4)

-> Index Only Scan using b_x on b

(cost=0.42..4.44 rows=1 width=4)

Index Cond: (x = 42)

(6 rows)

This might not look like a big deal, but consider the alternatives: What PostgreSQL does here is to create a “hash anti-join”. This is way more efficient than some sort of nested loop. In short: The nested loop is replaced with a join which can yield significant performance gains.

Making use of sorted output

Every database relies heavily on sorting, which is necessary to handle many different types of queries and optimize various workloads. One of the key optimizations in this area is that PostgreSQL can use indexes to optimize ORDER BY in a very clever way. See more about how PostgreSQL can optimize subqueries in the blog Subqueries and Performance in PostgreSQL.

Let’s see how optimizing ORDER BY looks:

demo=# CREATE TABLE t_sample AS
SELECT *
FROM generate_series(1, 1000000) AS id
ORDER BY random();
SELECT 1000000
demo=# VACUUM ANALYZE t_sample ;
VACUUM

demo=# CREATE TABLE t_sample AS

SELECT *

FROM generate_series(1, 1000000) AS id

ORDER BY random();

SELECT 1000000

demo=# VACUUM ANALYZE t_sample ;

VACUUM

The listing created a list of 1 million entries that’s been stored on disk in random order. Subsequently, VACUUM was called to ensure that all PostgreSQL hint bit-related issues were sorted out before the test is executed. If you want to know what hint bits are and how they operate, check out our post about hint bits in PostgreSQL.

Let's run explain analyze:

demo=# explain analyze SELECT *
           FROM t_sample
           ORDER BY id DESC
           LIMIT 100;
                        QUERY PLAN 
-------------------------------------------------------------------
 Limit (cost=25516.40..25528.07 rows=100 width=4)
     (actual time=84.806..86.252 rows=100 loops=1)
   -> Gather Merge (cost=25516.40..122745.49 rows=833334 width=4)
          (actual time=84.805..86.232 rows=100 loops=1)
     Workers Planned: 2
     Workers Launched: 2
     -> Sort (cost=24516.38..25558.05 rows=416667 width=4)
              (actual time=65.576..65.586 rows=82 loops=3)
              Sort Key: id DESC
              Sort Method: top-N heapsort Memory: 33kB
              Worker 0: Sort Method: top-N heapsort Memory: 32kB
              Worker 1: Sort Method: top-N heapsort Memory: 33kB
              -> Parallel Seq Scan on t_sample
                   (cost=0.00..8591.67 rows=416667 width=4)
                   (actual time=0.428..33.305 rows=333333 loops=3)
 Planning Time: 0.078 ms
 Execution Time: 86.286 ms
(12 rows)

demo=# explain analyze SELECT *

FROM t_sample

ORDER BY id DESC

LIMIT 100;

QUERY PLAN

-------------------------------------------------------------------

Limit (cost=25516.40..25528.07 rows=100 width=4)

(actual time=84.806..86.252 rows=100 loops=1)

-> Gather Merge (cost=25516.40..122745.49 rows=833334 width=4)

(actual time=84.805..86.232 rows=100 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Sort (cost=24516.38..25558.05 rows=416667 width=4)

(actual time=65.576..65.586 rows=82 loops=3)

Sort Key: id DESC

Sort Method: top-N heapsort Memory: 33kB

Worker 0: Sort Method: top-N heapsort Memory: 32kB

Worker 1: Sort Method: top-N heapsort Memory: 33kB

-> Parallel Seq Scan on t_sample

(cost=0.00..8591.67 rows=416667 width=4)

(actual time=0.428..33.305 rows=333333 loops=3)

Planning Time: 0.078 ms

Execution Time: 86.286 ms

(12 rows)

PostgreSQL needs more than one core to process the query in 86 milliseconds, which is a lot.

Now, what happens if we create an index?

demo=# CREATE INDEX idx_sample ON t_sample (id);
CREATE INDEX
demo=# explain analyze SELECT *
           FROM t_sample
           ORDER BY id DESC
           LIMIT 100;
                           QUERY PLAN 
------------------------------------------------------------------- 
 Limit (cost=0.42..3.02 rows=100 width=4)
       (actual time=0.071..0.125 rows=100 loops=1)
   -> Index Only Scan Backward using idx_sample on t_sample
          (cost=0.42..25980.42 rows=1000000 width=4)
          (actual time=0.070..0.113 rows=100 loops=1)
     Heap Fetches: 0
 Planning Time: 0.183 ms
 Execution Time: 0.142 ms
(5 rows)

demo=# CREATE INDEX idx_sample ON t_sample (id);

CREATE INDEX

demo=# explain analyze SELECT *

FROM t_sample

ORDER BY id DESC

LIMIT 100;

QUERY PLAN

-------------------------------------------------------------------

Limit (cost=0.42..3.02 rows=100 width=4)

(actual time=0.071..0.125 rows=100 loops=1)

-> Index Only Scan Backward using idx_sample on t_sample

(cost=0.42..25980.42 rows=1000000 width=4)

(actual time=0.070..0.113 rows=100 loops=1)

Heap Fetches: 0

Planning Time: 0.183 ms

Execution Time: 0.142 ms

(5 rows)

After adding an index, the query executed in a fraction of a millisecond. However, what is most important here is that we do NOT see a sort-step. PostgreSQL knows that the index returns data in sorted order (sorted by id) and thus there is no need to sort the data all over again. PostgreSQL consults the index and can simply take the data as it is and feed it to the client until enough rows have been found. In this special case, even an index-only scan is possible, because we are only looking for columns which actually exist in the index.

Reducing the amount of data to be sorted is vital to performance, and thus important to the user experience.

`min` and `max` can be used to reduce the amount of data to be sorted:

demo=# explain SELECT min(id), max(id) FROM t_sample;
                        QUERY PLAN 
------------------------------------------------------------------- 
Result (cost=0.91..0.92 rows=1 width=8)
   InitPlan 1 (returns $0)
      -> Limit (cost=0.42..0.45 rows=1 width=4)
           -> Index Only Scan using idx_sample on t_sample
               (cost=0.42..28480.42 rows=1000000 width=4)
               Index Cond: (id IS NOT NULL)
   InitPlan 2 (returns $1)
      -> Limit (cost=0.42..0.45 rows=1 width=4)
           -> Index Only Scan Backward using
               idx_sample on t_sample t_sample_1
                (cost=0.42..28480.42 rows=1000000 width=4)
                Index Cond: (id IS NOT NULL)
(9 rows)

demo=# explain SELECT min(id), max(id) FROM t_sample;

QUERY PLAN

-------------------------------------------------------------------

Result (cost=0.91..0.92 rows=1 width=8)

InitPlan 1 (returns $0)

-> Limit (cost=0.42..0.45 rows=1 width=4)

-> Index Only Scan using idx_sample on t_sample

(cost=0.42..28480.42 rows=1000000 width=4)

Index Cond: (id IS NOT NULL)

InitPlan 2 (returns $1)

-> Limit (cost=0.42..0.45 rows=1 width=4)

-> Index Only Scan Backward using

idx_sample on t_sample t_sample_1

(cost=0.42..28480.42 rows=1000000 width=4)

Index Cond: (id IS NOT NULL)

(9 rows)

The minimal value is the first value in a sorted list that is not NULL. The max value is the last value in a sequence of sorted values that is not NULL. PostgreSQL can take this into consideration and replace the standard way of processing aggregates with a subplan that simply consults the index.

In a way, calling “min” is the same as …

demo=# SELECT id
           FROM t_sample
           WHERE id IS NOT NULL
           ORDER BY id
           LIMIT 1;
 id
----
 1
(1 row)

demo=# explain SELECT id
           FROM t_sample
           WHERE id IS NOT NULL
           ORDER BY id
           LIMIT 1;
                         QUERY PLAN 
-------------------------------------------------------------------
 Limit (cost=0.42..0.45 rows=1 width=4)
   -> Index Only Scan using idx_sample on t_sample
          (cost=0.42..28480.42 rows=1000000 width=4)
     Index Cond: (id IS NOT NULL)
(3 rows)

demo=# SELECT id

FROM t_sample

WHERE id IS NOT NULL

ORDER BY id

LIMIT 1;

----

(1 row)

demo=# explain SELECT id

FROM t_sample

WHERE id IS NOT NULL

ORDER BY id

LIMIT 1;

QUERY PLAN

-------------------------------------------------------------------

Limit (cost=0.42..0.45 rows=1 width=4)

-> Index Only Scan using idx_sample on t_sample

(cost=0.42..28480.42 rows=1000000 width=4)

Index Cond: (id IS NOT NULL)

(3 rows)

Fortunately, PostgreSQL does this for you and optimizes the query perfectly.

Partition pruning and constraint exclusion at work

Partitioning is one of the favorite features of many PostgreSQL users. It offers you the ability to reduce table sizes and break up data into smaller chunks, which are easier to handle and in many cases (not all of them) faster to query. However, keep in mind that SOME queries might be faster - increased planning time can, however, backfire - this is not a rare corner case. We have seen partitioning decrease speed countless times in cases when partitioning wasn’t suitable.

Logically, the PostgreSQL optimizer has to take care of partitioning in a clever way and make sure that only the partitions are touched which might actually contain some of the data.

The following is an example of how the optimizer manages partitioning issues:

demo=# CREATE TABLE mynumbers (id int)
           PARTITION BY RANGE (id);
CREATE TABLE
demo=# CREATE TABLE negatives
           PARTITION OF mynumbers FOR VALUES FROM (MINVALUE) TO (0);
CREATE TABLE
demo=# CREATE TABLE positives
           PARTITION OF mynumbers FOR VALUES FROM (1) TO (MAXVALUE);
CREATE TABLE

demo=# CREATE TABLE mynumbers (id int)

PARTITION BY RANGE (id);

CREATE TABLE

demo=# CREATE TABLE negatives

PARTITION OF mynumbers FOR VALUES FROM (MINVALUE) TO (0);

CREATE TABLE

demo=# CREATE TABLE positives

PARTITION OF mynumbers FOR VALUES FROM (1) TO (MAXVALUE);

CREATE TABLE

In this case, a table with two partitions has been created.

What is the planner going to do if we are only looking for positive values?

demo=# explain SELECT * FROM mynumbers WHERE id > 1000;
             QUERY PLAN
------------------------------------------
 Seq Scan on positives mynumbers
      (cost=0.00..41.88 rows=850 width=4)
   Filter: (id > 1000)
(2 rows)

demo=# explain SELECT * FROM mynumbers WHERE id > 1000;

QUERY PLAN

------------------------------------------

Seq Scan on positives mynumbers

(cost=0.00..41.88 rows=850 width=4)

Filter: (id > 1000)

(2 rows)

The PostgreSQL optimizer correctly figured out that the data cannot be in one of the partitions and removed it from the execution plan.

If we want to query all values below 1000, it's not possible, and all partitions are correctly queried:

demo=# explain SELECT * FROM mynumbers WHERE id < 1000; 
QUERY PLAN 
-------------------------------------------------------- 
Append (cost=0.00..92.25 rows=1700 width=4) 
   -> Seq Scan on negatives mynumbers_1
          (cost=0.00..41.88 rows=850 width=4)
     Filter: (id < 1000) 
   > Seq Scan on positives mynumbers_2
          (cost=0.00..41.88 rows=850 width=4)
     Filter: (id < 1000)
(5 rows)

demo=# explain SELECT * FROM mynumbers WHERE id < 1000;

QUERY PLAN

--------------------------------------------------------

Append (cost=0.00..92.25 rows=1700 width=4)

-> Seq Scan on negatives mynumbers_1

(cost=0.00..41.88 rows=850 width=4)

Filter: (id < 1000)

> Seq Scan on positives mynumbers_2

(cost=0.00..41.88 rows=850 width=4)

Filter: (id < 1000)

(5 rows)

Conclusion…

The optimizations you have seen on this page are only the beginning of everything that PostgreSQL can do for you. It’s a good start on getting an impression of what is going on behind the scenes. Usually, database optimizers are some sort of black box and people rarely know which optimizations really happen. The goal of this page is to shed some light on the mathematical transformations going on.

To read about a use case involving the query optimzer, see Christoph Berg's Reproducible Builds blog post.
For other up-to-date information about the PostgreSQL query optimizer, check out our blogs tagged "optimizer".

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on X, Facebook, or LinkedIn.

One response to “How the PostgreSQL query optimizer works”

Marc Rechté says:

April 13, 2021 at 3:31 pm

Many thanks. Just noticed an indentation problem on line 16, 3rd listing of
"Optimizing join orders"§

Reply

How the PostgreSQL query optimizer works

PostgreSQL constant folding

Let’s see what happens during the constant folding process:

Note the following:

PostgreSQL query optimizer: function inlining

Let’s create a function to calculate a logarithm:

Now, let’s see what happens in a real query that uses function inlining:

Here is an example using PL/pgSQL:

Function stability: VOLATILE vs. STABLE vs. IMMUTABLE

Here's an example where the optimizer has to execute the function for every row:

Let’s run the query using a VOLATILE function:

Now, what if we try to do the same thing using a STABLE function?

Equality constraints

First, let’s create some sample data for our equality constraints:

What we have here are 1000 rows. We’ll run a simple query:

That opens the door for important optimizations:

View inlining and subselect flattening

Let’s create a view and query it:

The reason is that PostgreSQL has inlined the view as a subselect and flattened it out. How does that work?

This query is turned into …

Then the subselect is flattened out which leaves us with …

There is a parameter in the PostgreSQL query optimizer, from_collapse_limit, which controls this behavior:

Optimizing joins in PostgreSQL

Optimizing join orders

Let’s create some sample data and figure out how the optimizer works when it comes to join orders:

In the next steps, a couple of indexes are created:

We now have three tables. We’ll query them and see what happens:

Implicit vs. explicit joins in PostgreSQL

Let’s check out two queries, one with an explicit and one with an implicit join:

However, there is one parameter that is of great importance here, join_collapse_limit:

Determine the join strategy

Optimizing outer joins (LEFT JOIN, etc.)

Therefore I have compiled a real-world example showing how PostgreSQL actually reorders a real join:

But what happens if we decide to tweak the parameters a bit?

This is the same query, but with slightly altered parameters.

Automatic join pruning in PostgreSQL

In this case, we need to make sure that both sides actually have primary keys, or some kind of unique constraint:

To show how the PostgreSQL query planner handles join pruning, we’ll take a look at two different SQL statements:

The next example contains only a small variation of the query:

Let’s try out join pruning:

EXISTS and anti-joins

Making use of sorted output

Let’s see how optimizing ORDER BY looks:

Let's run explain analyze:

Now, what happens if we create an index?

min and max can be used to reduce the amount of data to be sorted:

In a way, calling “min” is the same as …

Partition pruning and constraint exclusion at work

The following is an example of how the optimizer manages partitioning issues:

What is the planner going to do if we are only looking for positive values?

If we want to query all values below 1000, it's not possible, and all partitions are correctly queried:

Conclusion…

One response to “How the PostgreSQL query optimizer works”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Let’s run the query using a `VOLATILE` function:

Now, what if we try to do the same thing using a `STABLE` function?

There is a parameter in the PostgreSQL query optimizer, `from_collapse_limit`, which controls this behavior:

However, there is one parameter that is of great importance here, `join_collapse_limit`:

`min` and `max` can be used to reduce the amount of data to be sorted: