Table of Contents
Recently, I could help solve a “mysterious” performance problem for a customer. The problem turned out to be a badly chosen query parameter type, which led to poor query performance. After working on this customer's case, I realized that the meaning of the data type of a query parameter is not universally understood, which is why I decided to write this article.
In SQL, we often encounter queries that only differ in constant values, for example
1 2 3 |
SELECT val FROM large WHERE id = 42; SELECT val FROM large WHERE id = 4711; SELECT val FROM large WHERE id = 1001; |
It makes sense to consider them as different instances of the same query:
1 |
SELECT val FROM large WHERE id = $1; |
Here, $1
is a placeholder for an arbitrary value. Such a placeholder is called a query parameter (also known as a bind variable). When you execute the query, you have to supply an actual parameter value as an argument for the parameter.
Using a placeholder has several advantages:
The PostgreSQL frontend/backend protocol provides two ways to send an SQL statement:
$1
, $1
etc.Note that you can use query parameters only for constants. It is not possible to use parameters for identifiers like table, column or schema names. Also, parameters are only supported in SELECT
, INSERT
, UPDATE
and DELETE
.
When you prepare a statement, you can assign it a name, so that the same statement can be executed with different parameter values. This is called a prepared statement, and every database client API provides support for that.
Using JDBC in Java, that would work as follows:
1 2 3 4 |
java.sql.PreparedStatement stmt = conn.prepareStatement('SELECT * FROM large WHERE id = ?'); stmt.setLong(1, 42L); java.sql.ResultSet rs = stmt.executeQuery(); |
The placeholder is always ?
, and the first argument of the set*
methods specifies the parameter number.
In PostgreSQL, you can also use prepared statements in SQL. PREPARE
creates a prepared statement, and EXECUTE
executes it:
1 2 3 4 |
PREPARE stmt(bigint) AS SELECT val FROM large WHERE id = $1; EXECUTE stmt(42); |
You may have noticed that in both of the above cases, there was a data type associated with the parameter: long
in Java and bigint
in SQL. However, you don't have to specify a data type. Then PostgreSQL will try to infer the data type itself. Such an “untyped value” is represented by the data type “unknown
” in PostgreSQL.
But PostgreSQL will also consider type conversions for data types other than unknown
. For example, there is no equality operator to compare bigint
and numeric
in PostgreSQL:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SELECT o.oprname FROM pg_operator AS o JOIN pg_amop AS ao ON o.oid = ao.amopopr JOIN pg_am AS a ON a.oid = ao.amopmethod WHERE /* an equality operator as used in B-tree indexes */ ao.amopstrategy = 3 AND a.amname = 'btree' AND (o.oprleft, o.oprright) = ('bigint'::regtype, 'numeric'::regtype); oprname ═════════ (0 rows) |
Still, you can compare a bigint
with a numeric
, because PostgreSQL will implicitly convert the bigint
to a numeric
, as the latter is the preferred data type for numbers. You can find the exact rules that govern these conversions in the documentation. These rules are quite complicated, because PostgreSQL supports overloading for functions and operators.
To illustrate that, let's consider this simple example:
1 2 3 4 5 6 7 8 9 10 |
CREATE TABLE large ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, val double precision NOT NULL ); INSERT INTO large (val) SELECT random() FROM generate_series(1, 100000); VACUUM (ANALYZE) large; |
Then we write a Java program that queries the database with the code snippet from above, but we set the parameter in three different ways. Using the auto_explain extension, we capture the execution plan on the server in each case.
integer
as the parameter typeThe parameter is set with
1 |
stmt.setInt(1, 42); |
and the resulting execution plan is
1 2 3 4 5 |
Query Text: SELECT * FROM large WHERE id = $1 Index Scan using large_pkey on large (...) (actual time=0.013..0.013 rows=1 loops=1) Index Cond: (id = 42) Buffers: shared hit=3 |
This works fine, since setInt
marks the parameter as type integer
, and there is an equality operator to compare integer
and bigint
.
unknown
as the parameter typeThe parameter is set as type java.sql.Types.OTHER
with
1 |
stmt.setObject(1, 42); |
and the resulting execution plan is
1 2 3 4 5 |
Query Text: SELECT * FROM large WHERE id = $1 Index Scan using large_pkey on large (...) (actual time=0.005..0.006 rows=1 loops=1) Index Cond: (id = 42) Buffers: shared hit=3 |
This works just as well, because a parameter of type unknown
is inferred to have the same type as id
, which is bigint
.
numeric
as the parameter typeThe parameter is set with
1 |
stmt.setBigDecimal(1, java.math.BigDecimal.valueOf(42)); |
and the resulting execution plan is
1 2 3 4 5 6 |
Query Text: SELECT * FROM large WHERE id = $1 Seq Scan on large (...) (actual time=0.012..11.123 rows=1 loops=1) Filter: ((id)::numeric = '42'::numeric) Rows Removed by Filter: 99999 Buffers: shared hit=935 |
Surprisingly, PostgreSQL used a sequential scan, and the performance is much worse. The reason is that the JDBC driver maps java.math.BigDecimal
to numeric
, so the parameter has the data type numeric
. As we have seen above, there is no equality operator for bigint
and numeric
, so PostgreSQL converts both to numeric
. This is clearly visible in the execution plan above. Since the primary key index is defined on “id
” and not on “id::numeric
”, it cannot be used in this case.
From the above, we can learn the following lesson:
setLong
, since the JDBC driver maps long
to bigint
.unknown
) and let PostgreSQL infer the correct data type.Sometimes both of these strategies won't work, because there is no data type in your host language that matches the PostgreSQL data type - and PostgreSQL guesses wrong. An example would be this query:
1 2 |
SELECT id, name FROM person WHERE birthday <= current_timestamp - $1; |
We want to supply an interval like “1 year
” or “1-0
”. Java does not have a data type for intervals, so we would have to send an unknown
parameter. However, PostgreSQL infers that we mean the operator
1 |
timestamp with time zone - timestamp with time zone |
rather than
1 |
timestamp with time zone - interval |
so we will end up with error messages complaining that “1 year
” is not a valid timestamp. In this case, the solution is to add an explicit type cast to the statement:
1 2 |
SELECT id, name FROM person WHERE birthday <= current_timestamp - CAST ($1 AS interval); |
So we have to amend the above list with a third point:
It is easy to overlook the fact that the data type of a query parameter can matter for query performance. To avoid problems, either don't specify a type and let PostgreSQL do the right thing, or explicitly specify the correct type.
If you want to read more about PostgreSQL query protocol and performance, perhaps you would be interested in my article on pipeline mode.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
in some RTLs, e.g. VCL in Delphi and CBuilder, double colon ":" might mark the beginning of the parameter name. If you have problems with this syntax, you may use parallel form:
SELECT id, name FROM person WHERE birthday <= current_timestamp - CAST($1 AS interval)
True. I'll change the code.