In version 9.6 PostgreSQL introduced parallel queries. The ability to use more than just one CPU core per query is a giant leap forward and has made PostgreSQL an even more desirable database. With parallel queries many workloads can be sped up considerably. However, in this article, I want to point out a small missing feature, which bugged one of our support clients recently: lack of support for SERIALIZABLE. UPDATE Sept. 2023: This restriction was lifted as of v12, see this note in the documentation.
Table of Contents
Consider the following example:
1 2 3 4 5 |
test=# CREATE TABLE t_demo AS SELECT id FROM generate_series(1, 10000000) AS id; SELECT 10000000 |
The sample table contains 10 million rows.
To tell the optimizer about the content of the table, running ANALYZE seems like a good idea:
1 2 |
test=# ANALYZE t_demo; ANALYZE |
Then users can tell PostgreSQL about the maximum number of cores allowed for the query we are about to run:
1 2 |
test=# SET max_parallel_workers_per_gather TO 4; SET |
After that we can already run a simple aggregation:
1 2 3 4 5 6 7 8 9 |
test=# explain SELECT count(*) FROM t_demo; QUERY PLAN ------------------------------------------------------------------------------------------- Finalize Aggregate (cost=76498.57..76498.58 rows=1 width=8) -> Gather (cost=76498.15..76498.56 rows=4 width=8) Workers Planned: 4 -> Partial Aggregate (cost=75498.15..75498.16 rows=1 width=8) -> Parallel Seq Scan on t_demo (cost=0.00..69248.12 rows=2500012 width=0) (5 rows) |
PostgreSQL will execute a so called parallel sequential scan using 4 CPU cores (= worker processes). The system will scale nicely and significantly better execution times can be observed.
We have seen a couple of people use SERIALIZABLE transactions for analytical requests. While this might not be a good idea for other reasons, there are also implications when it comes to the use of parallel queries. At this point (as of PostgreSQL 9.6) there is some code in PostgreSQL which is not yet fully parallel safe. Therefore parallel queries are not yet supported if you happen to use SERIALIZABLE. If you are affected by this, consider using REPEATABLE READ.
Here is what happens:
1 2 3 4 5 6 7 8 9 10 11 |
test=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN test=# explain SELECT count(*) FROM t_demo; QUERY PLAN ------------------------------------------------------------------------ Aggregate (cost=169248.60..169248.61 rows=1 width=8) -> Seq Scan on t_demo (cost=0.00..144248.48 rows=10000048 width=0) (2 rows) test=# COMMIT; COMMIT |
As you can see, only a single core is used.
The code inside the query planner (planner.c) is very clear about this:
We can't use parallelism in serializable mode because the predicate locking code is not parallel-aware. It's not catastrophic if someone tries to run a parallel plan in serializable mode; it just won't get any workers and will run serially. But it seems like a good heuristic to assume that the same serialization level will be in effect at plan time and execution time, so don't generate a parallel plan if we're in serializable mode.
Future versions of PostgreSQL will surely relax this at some point so that all isolation levels can benefit from parallel queries.
This limitation was removed as of PostgreSQL v12: see the documentation on "Allow parallelized queries when in SERIALIZABLE isolation mode (Thomas Munro)" for more information.
See also this note about the limitation to parallel query (v15).
In case you need any assistance, please feel free to contact us.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
What's involved with making predicate locking parallel-aware?