CYBERTEC PostgreSQL Logo

Query parameter data types and performance

03.2022 / Category: / Tags: |
PostgreSQL gets a query parameter of type 'unknown'
© Laurenz Albe 2022

 

Recently, I could help solve a “mysterious” performance problem for a customer. The problem turned out to be a badly chosen query parameter type, which led to poor query performance. After working on this customer's case, I realized that the meaning of the data type of a query parameter is not universally understood, which is why I decided to write this article.

What are query parameters?

In SQL, we often encounter queries that only differ in constant values, for example

It makes sense to consider them as different instances of the same query:

Here, $1 is a placeholder for an arbitrary value. Such a placeholder is called a query parameter (also known as a bind variable). When you execute the query, you have to supply an actual parameter value as an argument for the parameter.

Using a placeholder has several advantages:

  • the database can parse and optimize the statement once and reuse the execution plan several times with different parameter values to speed up processing (read this article to understand how that is done in PostgreSQL).
  • seperating query and parameters prevents the possibility of SQL injection, which is a major threat for database applications.

Support for query parameters in PostgreSQL

The PostgreSQL frontend/backend protocol provides two ways to send an SQL statement:

  • the simple query protocol sends the complete SQL statement as a single string
  • the extended query protocol separates the SQL statement and its parameters:
    • the “Prepare” message sends the statement with the placeholders $1, $1 etc.
    • the “Bind” message sends the parameter values
    • the “Execute” message executes the query

Note that you can use query parameters only for constants. It is not possible to use parameters for identifiers like table, column or schema names. Also, parameters are only supported in SELECT, INSERT, UPDATE and DELETE.

Prepared statements

When you prepare a statement, you can assign it a name, so that the same statement can be executed with different parameter values. This is called a prepared statement, and every database client API provides support for that.

Using JDBC in Java, that would work as follows:

The placeholder is always ?, and the first argument of the set* methods specifies the parameter number.

In PostgreSQL, you can also use prepared statements in SQL. PREPARE creates a prepared statement, and EXECUTE executes it:

Query parameter data types and type conversion

You may have noticed that in both of the above cases, there was a data type associated with the parameter: long in Java and bigint in SQL. However, you don't have to specify a data type. Then PostgreSQL will try to infer the data type itself. Such an “untyped value” is represented by the data type “unknown” in PostgreSQL.

But PostgreSQL will also consider type conversions for data types other than unknown. For example, there is no equality operator to compare bigint and numeric in PostgreSQL:

Still, you can compare a bigint with a numeric, because PostgreSQL will implicitly convert the bigint to a numeric, as the latter is the preferred data type for numbers. You can find the exact rules that govern these conversions in the documentation. These rules are quite complicated, because PostgreSQL supports overloading for functions and operators.

An example of the performance impact of query parameter types

To illustrate that, let's consider this simple example:

Then we write a Java program that queries the database with the code snippet from above, but we set the parameter in three different ways. Using the auto_explain extension, we capture the execution plan on the server in each case.

Using integer as the parameter type

The parameter is set with

and the resulting execution plan is

This works fine, since setInt marks the parameter as type integer, and there is an equality operator to compare integer and bigint.

Using unknown as the parameter type

The parameter is set as type java.sql.Types.OTHER with

and the resulting execution plan is

This works just as well, because a parameter of type unknown is inferred to have the same type as id, which is bigint.

Using numeric as the parameter type

The parameter is set with

and the resulting execution plan is

Surprisingly, PostgreSQL used a sequential scan, and the performance is much worse. The reason is that the JDBC driver maps java.math.BigDecimal to numeric, so the parameter has the data type numeric. As we have seen above, there is no equality operator for bigint and numeric, so PostgreSQL converts both to numeric. This is clearly visible in the execution plan above. Since the primary key index is defined on “id” and not on “id::numeric”, it cannot be used in this case.

How to avoid problems with query parameter data types

From the above, we can learn the following lesson:

  • Ideally, choose the correct data type for query parameters.
    In the above example, that would have meant using setLong, since the JDBC driver maps long to bigint.
  • In case of doubt, choose an untyped parameter (type unknown) and let PostgreSQL infer the correct data type.

Sometimes both of these strategies won't work, because there is no data type in your host language that matches the PostgreSQL data type - and PostgreSQL guesses wrong. An example would be this query:

We want to supply an interval like “1 year” or “1-0”. Java does not have a data type for intervals, so we would have to send an unknown parameter. However, PostgreSQL infers that we mean the operator

rather than

so we will end up with error messages complaining that “1 year” is not a valid timestamp. In this case, the solution is to add an explicit type cast to the statement:

So we have to amend the above list with a third point:

  • If automatic type resolution fails, add an explicit type cast to the statement.

Conclusion

It is easy to overlook the fact that the data type of a query parameter can matter for query performance. To avoid problems, either don't specify a type and let PostgreSQL do the right thing, or explicitly specify the correct type.

If you want to read more about PostgreSQL query protocol and performance, perhaps you would be interested in my article on pipeline mode.

2 responses to “Query parameter data types and performance”

  1. in some RTLs, e.g. VCL in Delphi and CBuilder, double colon ":" might mark the beginning of the parameter name. If you have problems with this syntax, you may use parallel form:


    SELECT id, name FROM person WHERE birthday <= current_timestamp - CAST($1 AS interval)

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram