Just like any advanced relational database, PostgreSQL uses a cost-based query optimizer that tries to turn your SQL queries into something efficient that executes in as little time as possible. For many people, the workings of the optimizer itself remain a mystery, so we have decided to give users some insight into what is really going on behind the scenes.
Table of Contents
So let’s take a tour through the PostgreSQL optimizer and get an overview of some of the most important techniques the optimizer uses to speed up queries. Note that the techniques listed here are in no way complete. There is a lot more going on, but it makes sense to take a look at the most basic things in order to gain a good understanding of the process.
Constant folding is one of the easier processes to understand. Nonetheless, it's extremely important.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
demo=# SELECT * FROM generate_series(1, 10) AS x WHERE x = 7 + 1; x --- 8 (1 row) demo=# explain SELECT * FROM generate_series(1, 10) AS x WHERE x = 7 + 1; QUERY PLAN ---------------------------------------------------------------------- Function Scan on generate_series x (cost=0.00..0.13 rows=1 width=4) Filter: (x = 8) (2 rows) |
Here we add a filter to the query: x = 7 + 1
. What the system does is to “fold” the constant and instead do “x = 8”
. Why is that important? In case “x” is indexed (assuming it is a table), we can easily look up 8 in the index.
1 2 3 4 5 6 7 8 |
demo=# explain SELECT * FROM generate_series(1, 10) AS x WHERE x - 1 = 7 ; QUERY PLAN ---------------------------------------------------------------------- Function Scan on generate_series x (cost=0.00..0.15 rows=1 width=4) Filter: ((x - 1) = 7) (2 rows) |
PostgreSQL does not transform the expression to “x = 8” in this case. That’s why you should try to make sure that the filter is on the right side, and not on the column you might want to index.
One more important technique is the idea of function inlining. The goal is to reduce function calls as much as possible and thus speed up the query.
1 2 3 4 5 6 7 8 9 10 11 12 |
demo=# CREATE OR REPLACE FUNCTION ld(int) RETURNS numeric AS $ SELECT log(2, $1); $ LANGUAGE 'sql'; CREATE FUNCTION demo=# SELECT ld(1024); ld --------------------- 10.0000000000000000 (1 row) |
2^10 = 1024. This looks right.
1 2 3 4 5 6 7 8 |
demo=# explain SELECT * FROM generate_series(1, 10) AS x WHERE ld(x) = 1000; QUERY PLAN ---------------------------------------------------------------------- Function Scan on generate_series x (cost=0.00..0.18 rows=1 width=4) Filter: (log('2'::numeric, (x)::numeric) = '1000'::numeric) (2 rows) |
Look in the WHERE clause. The ld
function has been replaced with the underlying log
function. Note that this is only possible in the case of SQL functions. PL/pgSQL and other stored procedure languages are black boxes to the optimizer, so whether these things are possible or not depends on the type of language used.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
demo=# CREATE OR REPLACE FUNCTION pl_ld(int) RETURNS numeric AS $ BEGIN RETURN log(2, $1); END; $ LANGUAGE 'plpgsql'; CREATE FUNCTION demo=# explain SELECT * FROM generate_series(1, 10) AS x WHERE pl_ld(x) = 1000; QUERY PLAN ---------------------------------------------------------------------- Function Scan on generate_series x (cost=0.00..2.63 rows=1 width=4) Filter: (pl_ld(x) = '1000'::numeric) (2 rows) |
In this case, inlining is not possible. While the code is basically the same, the programming language does make a major difference.
Something that is often overlooked is the concept of function stability. When creating a function, it makes a difference if a function is created as VOLATILE
(default), STABLE
, or as IMMUTABLE
. It can even make a major difference - especially if you are using indexes. Let’s create some sample data and sort these differences out:
VOLATILE
means that a function is not guaranteed to return the same result within the same transaction given the same input parameters. In other words, the PostgreSQL optimizer cannot see the function as a constant, and has to execute it for every row.
1 2 3 4 5 6 7 |
demo=# CREATE TABLE t_date AS SELECT * FROM generate_series('1900-01-01'::timestamptz, '2021-12-31'::timestamptz, '1 minute') AS x; SELECT 64164961 demo=# CREATE INDEX idx_date ON t_date (x); CREATE INDEX |
We have generated a list of 64 million entries containing 1 row per minute since January 1900, which produces 64 million entries.
VOLATILE
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
demo=# explain analyze SELECT * FROM t_date WHERE x = clock_timestamp(); QUERY PLAN ------------------------------------------------------------------- Gather (cost=1000.00..685947.45 rows=1 width=8) (actual time=2656.961..2658.547 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on t_date (cost=0.00..684947.35 rows=1 width=8) (actual time=2653.009..2653.009 rows=0 loops=3) Filter: (x = clock_timestamp()) Rows Removed by Filter: 21388320 Planning Time: 0.056 ms Execution Time: 2658.562 ms (8 rows) |
In this case, the query needs a whopping 2.6 seconds and eats up a ton of resources. The reason is that clock_timestamp()
is VOLATILE
.
STABLE
function?
1 2 3 4 5 6 7 8 9 10 11 12 13 |
demo=# explain analyze SELECT * FROM t_date WHERE x = now(); QUERY PLAN --------------------------------------------------------------- Index Only Scan using idx_date on t_date (cost=0.57..4.59 rows=1 width=8) (actual time=0.014..0.015 rows=0 loops=1) Index Cond: (x = now()) Heap Fetches: 0 Planning Time: 0.060 ms Execution Time: 0.026 ms (5 rows) |
The query is many thousands of times faster, because now PostgreSQL can turn it into a constant and thus use the index. If you want to learn more about function stability in PostgreSQL, here is more information.
The next optimization on our list is the concept of equality constraints. What PostgreSQL tries to do here is to derive implicit knowledge about the query.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
demo=# CREATE TABLE t_demo AS SELECT x, x AS y FROM generate_series(1, 1000) AS x; SELECT 1000 demo=# SELECT * FROM t_demo LIMIT 5; x | y ---+--- 1 | 1 2 | 2 3 | 3 4 | 4 5 | 5 (5 rows) |
1 2 3 4 5 6 7 8 9 |
demo=# explain SELECT * FROM t_demo WHERE x = y AND y = 4; QUERY PLAN ------------------------------------------------------- Seq Scan on t_demo (cost=0.00..20.00 rows=1 width=8) Filter: ((x = 4) AND (y = 4)) (2 rows) |
Again, the magic is in the execution plan. You can see that PostgreSQL has figured that x and y are 4.
1 2 3 4 5 6 7 8 9 10 11 12 |
demo=# CREATE INDEX idx_x ON t_demo (x); CREATE INDEX demo=# explain SELECT * FROM t_demo WHERE x = y AND y = 4; QUERY PLAN -------------------------------------------------------------------- Index Scan using idx_x on t_demo (cost=0.28..8.29 rows=1 width=8) Index Cond: (x = 4) Filter: (y = 4) (3 rows) |
Without this optimization, it would be absolutely impossible to use the index we have just created. In order to optimize the query, PostgreSQL will automatically figure out that we can use an index here.
Indexes are important!
When talking about the PostgreSQL optimizer and query optimization there is no way to ignore views and subselect handling.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
demo=# CREATE VIEW v1 AS SELECT * FROM generate_series(1, 10) AS x; CREATE VIEW demo=# explain SELECT * FROM v1 ORDER BY x DESC; QUERY PLAN ---------------------------------------------- Sort (cost=0.27..0.29 rows=10 width=4) Sort Key: x.x DESC -> Function Scan on generate_series x (cost=0.00..0.10 rows=10 width=4) (3 rows) |
When we look at the execution plan, the view is nowhere to be seen.
1 2 3 |
SELECT * FROM v1 ORDER BY x DESC; |
1 2 3 4 5 |
SELECT * FROM (SELECT * FROM generate_series(1, 10) AS x ) AS v1 ORDER BY x DESC; |
1 2 3 |
SELECT * FROM generate_series(1, 10) AS x ORDER BY x DESC; |
from_collapse_limit
, which controls this behavior:
1 2 3 4 5 |
demo=# SHOW from_collapse_limit; from_collapse_limit --------------------- 8 (1 row) |
The meaning of this parameter is that only up to 8 subselects in the FROM clause will be flattened out. If there are more than 8 subselects, they will be executed without being flattened out. In most real-world use cases, this is not a problem. It can only become an issue if the SQL statements used are very complex. More information about joins and join_collapse_limit can be found in our blog.
Keep in mind that inlining is not always possible. Developers are aware of that.
Joins are used in most queries and are therefore of incredible importance to good performance. We’ll now focus on some of the techniques relevant to joins in general.
The next important thing on our list is the way the PostgreSQL optimizer handles join orders. In a PostgreSQL database, joins are not necessarily done in the order proposed by the end user - quite the opposite: The query optimizer tries to figure out as many join options as possible.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
demo=# CREATE TABLE a AS SELECT x, x % 10 AS y FROM generate_series(1, 100000) AS x ORDER BY random(); SELECT 100000 demo=# CREATE TABLE b AS SELECT x, x % 10 AS y FROM generate_series(1, 1000000) AS x ORDER BY random(); SELECT 1000000 demo=# CREATE TABLE c AS SELECT x, x % 10 AS y FROM generate_series(1, 10000000) AS x ORDER BY random(); SELECT 10000000 |
1 2 3 4 5 6 7 8 |
demo=# CREATE INDEX a_x ON a(x); CREATE INDEX demo=# CREATE INDEX b_x ON b(x); CREATE INDEX demo=# CREATE INDEX c_x ON c(x); CREATE INDEX demo=# ANALYZE; ANALYZE |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
demo=# explain SELECT * FROM a, b, c WHERE c.x = a.x AND a.x = b.x AND a.x = 10; QUERY PLAN -------------------------------------------------------------------- Nested Loop (cost=1.15..25.23 rows=1 width=24) -> Nested Loop (cost=0.72..16.76 rows=1 width=16) -> Index Scan using a_x on a (cost=0.29..8.31 rows=1 width=8) Index Cond: (x = 10) -> Index Scan using b_x on b (cost=0.42..8.44 rows=1 width=8) Index Cond: (x = 10) -> Index Scan using c_x on c (cost=0.43..8.45 rows=1 width=8) Index Cond: (x = 10) (8 rows) |
Note that the query joins “c and a”
and then “a and b”
. However, let’s look at the plan more closely. PostgreSQL starts with index scans on a
and b
. The result is then joined with c
. Three indexes are used. This happens because of the equality constraints we discussed before. To find out about forcing the join order, see this blog.
Many people keep asking about explicit versus implicit joins. Basically, both variants are the same.
1 2 3 |
SELECT * FROM a, b WHERE a.id = b.id; vs. SELECT * FROM a JOIN b ON a.id = b.id; |
Both queries are identical and the planner will treat them the same way for most commonly seen queries. Mind that the explicit joins work with and without parenthesis.
join_collapse_limit
:
1 2 3 4 5 |
demo=# SHOW join_collapse_limit; join_collapse_limit --------------------- 8 (1 row) |
The join_collapse_limit
parameter controls how many explicit joins are planned implicitly. In other words, an implicit join is just like an explicit join, but only up to a certain number of joins controlled by this parameter. See this blog for more information. It is also possible to use join_collapse_limit to force the join order, as explained in this blog.
For the sake of simplicity, we can assume that it makes no difference for 95% of all queries and for most customers.
PostgreSQL offers various join strategies. These strategies include hash joins, merge joins, nested loops, and a lot more. We have already shared some of this information in previous posts. More on PostgreSQL join strategies can be found here.
Optimizing outer joins (LEFT JOIN
, RIGHT JOIN
, etc.) is an important topic. Usually, the planner has fewer options here than in the case of inner joins. The following optimizations are possible:
1 2 |
(A leftjoin B on (Pab)) innerjoin C on (Pac) = (A innerjoin C on (Pac)) leftjoin B on (Pab) |
where Pac is a predicate referencing A and C, etc (in this case, clearly
Pac cannot reference B, or the transformation is nonsensical).
1 2 3 4 5 |
(A leftjoin B on (Pab)) leftjoin C on (Pac) = (A leftjoin C on (Pac)) leftjoin B on (Pab) (A leftjoin B on (Pab)) leftjoin C on (Pbc) = A leftjoin (B leftjoin C on (Pbc)) on (Pab) |
While this theoretical explanation is correct, most people will have no clue what it means in real life.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
demo=# explain SELECT * FROM generate_series(1, 10) AS x LEFT JOIN generate_series(1, 100) AS y ON (x = y) JOIN generate_series(1, 10000) AS z ON (y = z) ; QUERY PLAN -------------------------------------------------------------------- Hash Join (cost=1.83..144.33 rows=500 width=12) Hash Cond: (z.z = x.x) -> Function Scan on generate_series z (cost=0.00..100.00 rows=10000 width=4) -> Hash (cost=1.71..1.71 rows=10 width=8) -> Hash Join (cost=0.23..1.71 rows=10 width=8) Hash Cond: (y.y = x.x) -> Function Scan on generate_series y (cost=0.00..1.00 rows=100 width=4) -> Hash (cost=0.10..0.10 rows=10 width=4) -> Function Scan on generate_series x (cost=0.00..0.10 rows=10 width=4) (9 rows) |
What we see here is that the PostgreSQL optimizer decides on joining x with y and then with z. In other words, the PostgreSQL optimizer has simply followed the join order as used in the SQL statement.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
demo=# explain SELECT * FROM generate_series(1, 10) AS x LEFT JOIN generate_series(1, 100000) AS y ON (x = y) JOIN generate_series(1, 100) AS z ON (y = z) ; QUERY PLAN -------------------------------------------------------------------- Hash Join (cost=1.83..1426.83 rows=5000 width=12) Hash Cond: (y.y = x.x) -> Function Scan on generate_series y (cost=0.00..1000.00 rows=100000 width=4) -> Hash (cost=1.71..1.71 rows=10 width=8) -> Hash Join (cost=0.23..1.71 rows=10 width=8) Hash Cond: (z.z = x.x) -> Function Scan on generate_series z (cost=0.00..1.00 rows=100 width=4) -> Hash (cost=0.10..0.10 rows=10 width=4) -> Function Scan on generate_series x (cost=0.00..0.10 rows=10 width=4) (9 rows) |
The difference is that PostgreSQL again starts with x
but then joins z
first, before adding y
.
Note that this optimization happens automatically. One reason why the optimizer can make this decision is because of the existence of optimizer support functions which were added to PostgreSQL a while ago. The reason why the reordering works is that support functions offer the planner a chance to figure out how many rows are returned from which part. If you use tables instead of set returning functions, support functions are irrelevant. PostgreSQL v16 has added support for "anti-joins" in RIGHT
and OUTER
queries.
Not every join in a query is actually executed by PostgreSQL. The optimizer knows the concept of join pruning and is able to get rid of pointless joins quite efficiently. The main question is: When is that possible, and how can we figure out what’s going on?
The next listing shows how some suitable sample data can be created:
1 2 3 4 5 6 7 8 9 10 |
demo=# CREATE TABLE t_left AS SELECT * FROM generate_series(1, 1000) AS id; SELECT 1000 demo=# CREATE UNIQUE INDEX idx_left ON t_left (id); CREATE INDEX demo=# CREATE TABLE t_right AS SELECT * FROM generate_series(1, 100) AS id; SELECT 100 demo=# CREATE UNIQUE INDEX idx_right ON t_right (id); CREATE INDEX |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
demo=# d t_left Table 'public.t_left' Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- id | integer | | | Indexes: 'idx_left' UNIQUE, btree (id) demo=# d t_right Table 'public.t_right' Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- id | integer | | | Indexes: 'idx_right' UNIQUE, btree (id) |
1 2 3 4 5 6 7 8 9 10 11 |
demo=# explain SELECT * FROM t_left AS a LEFT JOIN t_right AS b ON (a.id = b.id); QUERY PLAN -------------------------------------------------------------------- Hash Left Join (cost=3.25..20.89 rows=1000 width=8) Hash Cond: (a.id = b.id) -> Seq Scan on t_left a (cost=0.00..15.00 rows=1000 width=4) -> Hash (cost=2.00..2.00 rows=100 width=4) -> Seq Scan on t_right b (cost=0.00..2.00 rows=100 width=4) (5 rows) |
In this case, the join has to be executed. As you can see, PostgreSQL has decided on a hash join.
1 2 3 4 5 6 7 |
demo=# explain SELECT a.* FROM t_left AS a LEFT JOIN t_right AS b ON (a.id = b.id); QUERY PLAN ------------------------------------------------------------ Seq Scan on t_left a (cost=0.00..15.00 rows=1000 width=4) (1 row) |
Join pruning can happen if we DO NOT read data from the right side, and if the right side is unique. If the right side is not unique, the join might actually increase the number of rows returned; so pruning is only possible in case the right side is unique.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
demo=# DROP INDEX idx_right ; DROP INDEX demo=# explain SELECT a.* FROM t_left AS a LEFT JOIN t_right AS b ON (a.id = b.id); QUERY PLAN -------------------------------------------------------------------- Hash Left Join (cost=3.25..23.00 rows=1000 width=4) Hash Cond: (a.id = b.id) -> Seq Scan on t_left a (cost=0.00..15.00 rows=1000 width=4) -> Hash (cost=2.00..2.00 rows=100 width=4) -> Seq Scan on t_right b (cost=0.00..2.00 rows=100 width=4) (5 rows) |
While it is certainly a good thing to have join pruning in the PostgreSQL optimizer, you have to be aware of the fact that the planner is basically fixing something which should not exist anyway. Write queries efficiently in the first place; don’t add pointless joins.
There is also something common you will see everywhere in the code within SQL EXISTS. Here’s an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
demo=# explain SELECT * FROM a WHERE NOT EXISTS (SELECT 1 FROM b WHERE a.x = b.x AND b.x = 42); QUERY PLAN ------------------------------------------------------------------- Hash Anti Join (cost=4.46..2709.95 rows=100000 width=8) Hash Cond: (a.x = b.x) -> Seq Scan on a (cost=0.00..1443.00 rows=100000 width=8) -> Hash (cost=4.44..4.44 rows=1 width=4) -> Index Only Scan using b_x on b (cost=0.42..4.44 rows=1 width=4) Index Cond: (x = 42) (6 rows) |
This might not look like a big deal, but consider the alternatives: What PostgreSQL does here is to create a “hash anti-join”. This is way more efficient than some sort of nested loop. In short: The nested loop is replaced with a join which can yield significant performance gains.
Every database relies heavily on sorting, which is necessary to handle many different types of queries and optimize various workloads. One of the key optimizations in this area is that PostgreSQL can use indexes to optimize ORDER BY in a very clever way. See more about how PostgreSQL can optimize subqueries in the blog Subqueries and Performance in PostgreSQL.
1 2 3 4 5 6 7 |
demo=# CREATE TABLE t_sample AS SELECT * FROM generate_series(1, 1000000) AS id ORDER BY random(); SELECT 1000000 demo=# VACUUM ANALYZE t_sample ; VACUUM |
The listing created a list of 1 million entries that’s been stored on disk in random order. Subsequently, VACUUM
was called to ensure that all PostgreSQL hint bit-related issues were sorted out before the test is executed. If you want to know what hint bits are and how they operate, check out our post about hint bits in PostgreSQL.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
demo=# explain analyze SELECT * FROM t_sample ORDER BY id DESC LIMIT 100; QUERY PLAN ------------------------------------------------------------------- Limit (cost=25516.40..25528.07 rows=100 width=4) (actual time=84.806..86.252 rows=100 loops=1) -> Gather Merge (cost=25516.40..122745.49 rows=833334 width=4) (actual time=84.805..86.232 rows=100 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=24516.38..25558.05 rows=416667 width=4) (actual time=65.576..65.586 rows=82 loops=3) Sort Key: id DESC Sort Method: top-N heapsort Memory: 33kB Worker 0: Sort Method: top-N heapsort Memory: 32kB Worker 1: Sort Method: top-N heapsort Memory: 33kB -> Parallel Seq Scan on t_sample (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.428..33.305 rows=333333 loops=3) Planning Time: 0.078 ms Execution Time: 86.286 ms (12 rows) |
PostgreSQL needs more than one core to process the query in 86 milliseconds, which is a lot.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
demo=# CREATE INDEX idx_sample ON t_sample (id); CREATE INDEX demo=# explain analyze SELECT * FROM t_sample ORDER BY id DESC LIMIT 100; QUERY PLAN ------------------------------------------------------------------- Limit (cost=0.42..3.02 rows=100 width=4) (actual time=0.071..0.125 rows=100 loops=1) -> Index Only Scan Backward using idx_sample on t_sample (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.070..0.113 rows=100 loops=1) Heap Fetches: 0 Planning Time: 0.183 ms Execution Time: 0.142 ms (5 rows) |
After adding an index, the query executed in a fraction of a millisecond. However, what is most important here is that we do NOT see a sort-step. PostgreSQL knows that the index returns data in sorted order (sorted by id) and thus there is no need to sort the data all over again. PostgreSQL consults the index and can simply take the data as it is and feed it to the client until enough rows have been found. In this special case, even an index-only scan is possible, because we are only looking for columns which actually exist in the index.
Reducing the amount of data to be sorted is vital to performance, and thus important to the user experience.
min
and max
can be used to reduce the amount of data to be sorted:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
demo=# explain SELECT min(id), max(id) FROM t_sample; QUERY PLAN ------------------------------------------------------------------- Result (cost=0.91..0.92 rows=1 width=8) InitPlan 1 (returns $0) -> Limit (cost=0.42..0.45 rows=1 width=4) -> Index Only Scan using idx_sample on t_sample (cost=0.42..28480.42 rows=1000000 width=4) Index Cond: (id IS NOT NULL) InitPlan 2 (returns $1) -> Limit (cost=0.42..0.45 rows=1 width=4) -> Index Only Scan Backward using idx_sample on t_sample t_sample_1 (cost=0.42..28480.42 rows=1000000 width=4) Index Cond: (id IS NOT NULL) (9 rows) |
The minimal value is the first value in a sorted list that is not NULL. The max value is the last value in a sequence of sorted values that is not NULL. PostgreSQL can take this into consideration and replace the standard way of processing aggregates with a subplan that simply consults the index.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
demo=# SELECT id FROM t_sample WHERE id IS NOT NULL ORDER BY id LIMIT 1; id ---- 1 (1 row) demo=# explain SELECT id FROM t_sample WHERE id IS NOT NULL ORDER BY id LIMIT 1; QUERY PLAN ------------------------------------------------------------------- Limit (cost=0.42..0.45 rows=1 width=4) -> Index Only Scan using idx_sample on t_sample (cost=0.42..28480.42 rows=1000000 width=4) Index Cond: (id IS NOT NULL) (3 rows) |
Fortunately, PostgreSQL does this for you and optimizes the query perfectly.
Partitioning is one of the favorite features of many PostgreSQL users. It offers you the ability to reduce table sizes and break up data into smaller chunks, which are easier to handle and in many cases (not all of them) faster to query. However, keep in mind that SOME queries might be faster - increased planning time can, however, backfire - this is not a rare corner case. We have seen partitioning decrease speed countless times in cases when partitioning wasn’t suitable.
Logically, the PostgreSQL optimizer has to take care of partitioning in a clever way and make sure that only the partitions are touched which might actually contain some of the data.
1 2 3 4 5 6 7 8 9 |
demo=# CREATE TABLE mynumbers (id int) PARTITION BY RANGE (id); CREATE TABLE demo=# CREATE TABLE negatives PARTITION OF mynumbers FOR VALUES FROM (MINVALUE) TO (0); CREATE TABLE demo=# CREATE TABLE positives PARTITION OF mynumbers FOR VALUES FROM (1) TO (MAXVALUE); CREATE TABLE |
In this case, a table with two partitions has been created.
1 2 3 4 5 6 7 |
demo=# explain SELECT * FROM mynumbers WHERE id > 1000; QUERY PLAN ------------------------------------------ Seq Scan on positives mynumbers (cost=0.00..41.88 rows=850 width=4) Filter: (id > 1000) (2 rows) |
The PostgreSQL optimizer correctly figured out that the data cannot be in one of the partitions and removed it from the execution plan.
1 2 3 4 5 6 7 8 9 10 11 |
demo=# explain SELECT * FROM mynumbers WHERE id < 1000; QUERY PLAN -------------------------------------------------------- Append (cost=0.00..92.25 rows=1700 width=4) -> Seq Scan on negatives mynumbers_1 (cost=0.00..41.88 rows=850 width=4) Filter: (id < 1000) > Seq Scan on positives mynumbers_2 (cost=0.00..41.88 rows=850 width=4) Filter: (id < 1000) (5 rows) |
The optimizations you have seen on this page are only the beginning of everything that PostgreSQL can do for you. It’s a good start on getting an impression of what is going on behind the scenes. Usually, database optimizers are some sort of black box and people rarely know which optimizations really happen. The goal of this page is to shed some light on the mathematical transformations going on.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on X, Facebook, or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Many thanks. Just noticed an indentation problem on line 16, 3rd listing of
"Optimizing join orders"§