Table of Contents
For detailed query performance analysis, you need EXPLAIN (ANALYZE, BUFFERS)
output for an SQL statement. With a parameterized statement, it can be difficult to construct a run-able statement for EXPLAIN (ANALYZE)
. Sometimes, you don't even know the parameter values. I'll show you how you can get at least plain EXPLAIN
output for the generic plan in this case. Such an execution plan is better than nothing, and maybe even enough to guess what the problem is.
With the extended query protocol, PostgreSQL allows you to separate the SQL statement and the constants used in the statement. This enhances security, since it makes SQL injection impossible, but it is mostly a performance feature. Such parameterized statements can be named and reused with different parameter values, thereby saving the database engine the work of parsing the same statement over and over. What is more, using this feature PostgreSQL can sometimes avoid the much greater overhead of generating execution plans for each execution of the statement.
You typically encounter parameterized statements in two situations:
The placeholders for the parameters are $1
, $2
and so on. In the case of PL/pgSQL, you won't see them; the PL/pgSQL call handler will substitute these placeholders instead of the variable names. Note that parameterized statements can only be SELECT
, INSERT
, UPDATE
, DELETE
and VALUES
statements.
Normally, PostgreSQL generates an execution plan whenever a query is executed. However, PostgreSQL can cache execution plans for named parameterized statements. Plans are only ever cached inside a single database session; there is no shared memory plan cache.
If PostgreSQL thinks it can do so without hurting performance, it will start using a generic plan for the statement. Such an execution plan does not take the parameter values into account and can be reused. The decision whether to switch to a generic plan or not is based on a heuristic and is typically made when the statement executes for the sixth time. You can tell such a plan from the placeholders that it uses instead of the parameters. Here is an example using a prepared statement:
1 2 3 4 5 6 7 8 9 |
PREPARE stmt(text) AS SELECT oid FROM pg_class WHERE relname = $1; EXPLAIN (COSTS OFF) EXECUTE stmt('pg_proc'); QUERY PLAN ═════════════════════════════════════════════════════════ Index Scan using pg_class_relname_nsp_index on pg_class Index Cond: (relname = 'pg_proc'::text) (2 rows) |
The next four executions of EXPLAIN
look the same, but then we see
1 2 3 4 5 6 |
EXPLAIN (COSTS OFF) EXECUTE stmt('pg_attribute'); QUERY PLAN ═════════════════════════════════════════════════════════ Index Scan using pg_class_relname_nsp_index on pg_class Index Cond: (relname = $1) (2 rows) |
PostgreSQL has started using a generic plan! From that point on, the query planning time will become much shorter.
You can use the PostgreSQL parameter plan_cache_mode
to influence the behavior described in the previous section. The default setting “auto
” chooses the heuristic described above, where PostgreSQL decides after a few executions whether a generic plan would be beneficial or not.
With the setting “force_custom_plan
” you can tell PostgreSQL to never use a generic plan. That is a good idea if the generic plan turns out to be not quite as good as PostgreSQL thought. It is also a good setting for data warehouses, where you normally run expensive analytical queries and saving on planning time is less important than getting the best possible execution plan.
Finally, the setting “force_generic_plan
” makes PostgreSQL use a generic plan immediately. We will make use of that setting later.
A parameterized statement looks like this in the log:
1 2 |
LOG: duration: 0.012 ms execute stmt: SELECT oid FROM pg_class WHERE relname = $1 DETAIL: parameters: $1 = 'pg_proc' |
Usually, the parameters are logged as a detail message, but if there are many of them, it can be a considerable effort to replace all the placeholders with the parameter values. You also don't see the parameter data type in the log, so you may have to look up the table definition to know whether you should write 42
or '42'
. If the statement caused an error and you didn't set log_parameter_max_length_on_error
to a nonzero value, you won't get the parameters logged at all:
1 2 |
ERROR: canceling statement due to statement timeout STATEMENT: SELECT oid FROM pg_class WHERE relname = $1 |
pg_stat_statements is the Swiss army knife for analyzing database workloads. One of its features is that it ignores the values of constants, so that statements that only differ in constants are aggregated together. So if you query the pg_stat_statements
view, you will see placeholders even in statements that were not parameterized originally. Also, since pg_stat_statements
collects statistics over many executions of a statement, it doesn't collect the actual parameter values for any of them.
If you find a problem statement in the log or in pg_stat_statements
, you want to analyze its performance. In order to do that, you have to guess appropriate parameter values, so that you can get an execution plan with EXPLAIN (ANALYZE, BUFFERS)
. This can be tedious and take a long time.
For a first analysis, it is helpful to see the execution plan as generated by EXPLAIN
(without ANALYZE
). Since “plain” EXPLAIN
does not execute the query, it should not depend on the actual parameter values, as long as we are happy with the generic plan. Unfortunately, EXPLAIN
refuses to generate a generic plan for a parameterized statement:
1 2 3 4 |
EXPLAIN SELECT oid FROM pg_class WHERE relname = $1; ERROR: there is no parameter $1 LINE 1: EXPLAIN SELECT oid FROM pg_class WHERE relname = $1; ^ |
We get that error even if we set plan_cache_mode = force_generic_plan
.
PREPARE
We want to do better than that, and we can. Using PREPARE
, we can create a prepared statement with a placeholder:
1 |
PREPARE stmt(name) AS SELECT oid FROM pg_class WHERE relname = $1; |
Now we can force a generic plan and EXPLAIN
the prepared statement. We can supply NULL as parameter value, since NULL exists for every data type, and the parameter value is ignored anyway:
1 2 3 4 5 6 7 8 9 10 |
SET plan_cache_mode = force_generic_plan; EXPLAIN EXECUTE stmt(NULL); QUERY PLAN ═══════════════════════════════════════════════════════════════════════════════════════════ Index Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..8.29 rows=1 width=4) Index Cond: (relname = $1) (2 rows) DEALLOCATE stmt; |
The only remaining fly in the ointment is that we have to figure out the appropriate data types for the parameters.
unknown
with a parameterized statementPseudo-types are data types that you cannot use in table definitions. One of these data types is “unknown
”: It is used during query parsing for string constants whose data type must be resolved later, based on the context. We can use unknown
as a data type for query parameters and let PostgreSQL figure out the appropriate data type itself:
1 2 3 4 5 6 7 8 9 10 11 12 |
PREPARE stmt(unknown) AS SELECT oid FROM pg_class WHERE relname = $1; SET plan_cache_mode = force_generic_plan; EXPLAIN EXECUTE stmt(NULL); QUERY PLAN ═══════════════════════════════════════════════════════════════════════════════════════════ Index Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..8.29 rows=1 width=4) Index Cond: (relname = $1) (2 rows) DEALLOCATE stmt; |
Now we have a simple algorithm to get the generic plan of a parameterized statement:
unknown
” parametersplan_cache_mode
” to “force_generic_plan
”EXPLAIN
the prepared statements using NULLs as argumentsI wrapped all this into a function and wrote the extension generic_plan. It is written in PL/pgSQL and does not require superuser permissions to install. Here you can see it in action:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
CREATE EXTENSION IF NOT EXISTS generic_plan; SELECT generic_plan('SELECT * FROM pg_sequences WHERE max_value < last_value + $1'); generic_plan ═════════════════════════════════════════════════════════════════════════════════════════════ Subquery Scan on pg_sequences (cost=1.09..24.10 rows=1 width=245) Filter: (pg_sequences.max_value < (pg_sequences.last_value + $1)) -> Nested Loop (cost=1.09..24.09 rows=1 width=245) Join Filter: (c.oid = s.seqrelid) -> Seq Scan on pg_sequence s (cost=0.00..1.06 rows=6 width=49) -> Materialize (cost=1.09..22.76 rows=3 width=136) -> Hash Join (cost=1.09..22.74 rows=3 width=136) Hash Cond: (c.relnamespace = n.oid) -> Seq Scan on pg_class c (cost=0.00..21.62 rows=8 width=76) Filter: (relkind = 'S'::'char') -> Hash (cost=1.05..1.05 rows=3 width=68) -> Seq Scan on pg_namespace n (cost=0.00..1.05 rows=3 width=68) Filter: (NOT pg_is_other_temp_schema(oid)) (13 rows) |
It can be complicated to collect parameter values to analyze the execution of a parameterized statement, but using the generic_plan extension we can at least get a generic plan easily. The tricks used are a prepared statement with parameters of type “unknown
”, adjusting plan_cache_mode
and using NULLs as parameter values.
In case you're interested in learning more about parameters, check out my blog on Query Parameter Data Types and Performance.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
But wouldn't the usage of
NULL
for building the "generic" plan shift the estimation the planner makes?As far as I understood, the generic plan is just the stable result of planning execution of the first 5 requests with some real data. And because the planner takes those in consideration, plans for
NULL
can be quite different than for some (or any) non-null parameter values.For example, suppose the table with 2M rows has 1k values per non-empty key, and 1M rows with nulls. Would the "generic" plans for null key the same as for the 6-th plan after 5 requests with non-null keys?
If I force a generic plan with
plan_cache_mode = force_generic_plan
, PostgreSQL won't even look at the parameter values. So the NULLs cannot influence the plan that is created. There is only one generic plan for a query, so yes, that would be the same plan that PostgreSQL generates at the sixth execution.