CYBERTEC PostgreSQL Logo

EXPLAIN that parameterized statement in PostgreSQL!

11.2022 / Category: / Tags:
the database consultant needs a plan to get a clue about the parameterized statement
© Laurenz Albe 2022

 

For detailed query performance analysis, you need EXPLAIN (ANALYZE, BUFFERS) output for an SQL statement. With a parameterized statement, it can be difficult to construct a run-able statement for EXPLAIN (ANALYZE). Sometimes, you don't even know the parameter values. I'll show you how you can get at least plain EXPLAIN output for the generic plan in this case. Such an execution plan is better than nothing, and maybe even enough to guess what the problem is.

Parameterized statements

With the extended query protocol, PostgreSQL allows you to separate the SQL statement and the constants used in the statement. This enhances security, since it makes SQL injection impossible, but it is mostly a performance feature. Such parameterized statements can be named and reused with different parameter values, thereby saving the database engine the work of parsing the same statement over and over. What is more, using this feature PostgreSQL can sometimes avoid the much greater overhead of generating execution plans for each execution of the statement.

You typically encounter parameterized statements in two situations:

  • prepared statements with parameters (typically used through a client API)
  • static SQL statements in PL/pgSQL functions that use variables

The placeholders for the parameters are $1, $2 and so on. In the case of PL/pgSQL, you won't see them; the PL/pgSQL call handler will substitute these placeholders instead of the variable names. Note that parameterized statements can only be SELECT, INSERT, UPDATE, DELETE and VALUES statements.

Generic plans for parameterized statements

Normally, PostgreSQL generates an execution plan whenever a query is executed. However, PostgreSQL can cache execution plans for named parameterized statements. Plans are only ever cached inside a single database session; there is no shared memory plan cache.

If PostgreSQL thinks it can do so without hurting performance, it will start using a generic plan for the statement. Such an execution plan does not take the parameter values into account and can be reused. The decision whether to switch to a generic plan or not is based on a heuristic and is typically made when the statement executes for the sixth time. You can tell such a plan from the placeholders that it uses instead of the parameters. Here is an example using a prepared statement:

The next four executions of EXPLAIN look the same, but then we see

PostgreSQL has started using a generic plan! From that point on, the query planning time will become much shorter.

Forcing a generic plan for a parameterized statement

You can use the PostgreSQL parameter plan_cache_mode to influence the behavior described in the previous section. The default setting “auto” chooses the heuristic described above, where PostgreSQL decides after a few executions whether a generic plan would be beneficial or not.

With the setting “force_custom_plan” you can tell PostgreSQL to never use a generic plan. That is a good idea if the generic plan turns out to be not quite as good as PostgreSQL thought. It is also a good setting for data warehouses, where you normally run expensive analytical queries and saving on planning time is less important than getting the best possible execution plan.

Finally, the setting “force_generic_plan” makes PostgreSQL use a generic plan immediately. We will make use of that setting later.

Where can you encounter a parameterized statement?

Parameterized statements in the PostgreSQL log

A parameterized statement looks like this in the log:

Usually, the parameters are logged as a detail message, but if there are many of them, it can be a considerable effort to replace all the placeholders with the parameter values. You also don't see the parameter data type in the log, so you may have to look up the table definition to know whether you should write 42 or '42'. If the statement caused an error and you didn't set log_parameter_max_length_on_error to a nonzero value, you won't get the parameters logged at all:

Parameterized statements in pg_stat_statements

pg_stat_statements is the Swiss army knife for analyzing database workloads. One of its features is that it ignores the values of constants, so that statements that only differ in constants are aggregated together. So if you query the pg_stat_statements view, you will see placeholders even in statements that were not parameterized originally. Also, since pg_stat_statements collects statistics over many executions of a statement, it doesn't collect the actual parameter values for any of them.

The need for a generic plan

If you find a problem statement in the log or in pg_stat_statements, you want to analyze its performance. In order to do that, you have to guess appropriate parameter values, so that you can get an execution plan with EXPLAIN (ANALYZE, BUFFERS). This can be tedious and take a long time.

For a first analysis, it is helpful to see the execution plan as generated by EXPLAIN (without ANALYZE). Since “plain” EXPLAIN does not execute the query, it should not depend on the actual parameter values, as long as we are happy with the generic plan. Unfortunately, EXPLAIN refuses to generate a generic plan for a parameterized statement:

We get that error even if we set plan_cache_mode = force_generic_plan.

Generating a generic plan for a parameterized statement using PREPARE

We want to do better than that, and we can. Using PREPARE, we can create a prepared statement with a placeholder:

Now we can force a generic plan and EXPLAIN the prepared statement. We can supply NULL as parameter value, since NULL exists for every data type, and the parameter value is ignored anyway:

The only remaining fly in the ointment is that we have to figure out the appropriate data types for the parameters.

Using the pseudo-type unknown with a parameterized statement

Pseudo-types are data types that you cannot use in table definitions. One of these data types is “unknown”: It is used during query parsing for string constants whose data type must be resolved later, based on the context. We can use unknown as a data type for query parameters and let PostgreSQL figure out the appropriate data type itself:

Putting it all together in an extension

Now we have a simple algorithm to get the generic plan of a parameterized statement:

  • count the number of parameters
  • create a prepared statement with that many “unknown” parameters
  • set “plan_cache_mode” to “force_generic_plan
  • EXPLAIN the prepared statements using NULLs as arguments

I wrapped all this into a function and wrote the extension generic_plan. It is written in PL/pgSQL and does not require superuser permissions to install. Here you can see it in action:

Conclusion

It can be complicated to collect parameter values to analyze the execution of a parameterized statement, but using the generic_plan extension we can at least get a generic plan easily. The tricks used are a prepared statement with parameters of type “unknown”, adjusting plan_cache_mode and using NULLs as parameter values.

In case you're interested in learning more about parameters, check out my blog on Query Parameter Data Types and Performance.

2 responses to “EXPLAIN that parameterized statement in PostgreSQL!”

  1. But wouldn't the usage of NULL for building the "generic" plan shift the estimation the planner makes?

    As far as I understood, the generic plan is just the stable result of planning execution of the first 5 requests with some real data. And because the planner takes those in consideration, plans for NULL can be quite different than for some (or any) non-null parameter values.

    For example, suppose the table with 2M rows has 1k values per non-empty key, and 1M rows with nulls. Would the "generic" plans for null key the same as for the 6-th plan after 5 requests with non-null keys?

    • If I force a generic plan with plan_cache_mode = force_generic_plan, PostgreSQL won't even look at the parameter values. So the NULLs cannot influence the plan that is created. There is only one generic plan for a query, so yes, that would be the same plan that PostgreSQL generates at the sixth execution.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram