Table of Contents
PostgreSQL commit 74dfe58a5927b22c744b29534e67bfdd203ac028 has added “support functions”. This exciting new functionality that allows the optimizer some insight into functions. This article will discuss how this will improve query planning for PostgreSQL v12. If you are willing to write C code, you can also use this functionality for your own functions.
Up to now, the PostgreSQL optimizer couldn't really do a lot about functions. No matter how much it knew about the arguments of a function, it didn't have the faintest clue about the function result. This also applied to built-in functions: no information about them was “wired into” the optimizer.
Let's look at a simple example: language="sql"
1 2 3 4 5 6 |
EXPLAIN SELECT * FROM unnest(ARRAY[1,2,3]); QUERY PLAN ------------------------------------------------------------- Function Scan on unnest (cost=0.00..1.00 rows=100 width=4) (1 row) |
PostgreSQL knows exactly that the array contains three elements. Still, it has no clue how many rows unnest
will return, so it estimates an arbitrary 100 result rows. If this function invocation is part of a bigger SQL statement, the wrong result count can lead to a bad plan. The most common problem is that PostgreSQL will select bad join strategies based on wrong cardinality estimates. If you have ever waited for a nested loop join to finish that got 10000 instead of 10 rows in the outer relation, you know what I'm talking about.
There is the option to specify COST
and ROWS
on a function to improve the estimates. But you can only specify a constant there, which often is not good enough.
There were many other ways in which optimizer support for functions was lacking. This situation has been improved with support functions.
The CREATE FUNCTION
statement has been extended like this:
1 2 3 |
CREATE FUNCTION name (...) RETURNS ... SUPPORT supportfunction AS ... |
This way a function gets a “support function” that knows about the function and can help the optimizer produce a better plan. Only a superuser can use the SUPPORT
option.
Such a support function must have the signature
1 |
supportfunction(internal) RETURNS internal |
“internal
” means that the function argument and return code are pointers to some C structure. That means that the function has to be written in C and is not callable from SQL.
When the optimizer considers some optimization for a function call, it invokes the support function. If the support function returns NULL
to indicate it cannot help with that request, the optimizer goes ahead and plans as usual.
The optimizer can pass different C structures to the support function, depending on the optimization it considers. See src/include/nodes/supportnodes.h
in the PostgreSQL source for details.
A support function can provide some or all of the following features:
This in only called for functions that return boolean
and are at the top level of a WHERE
or JOIN
condition, for example
1 |
SELECT ... FROM a JOIN b ON func(a.x, b.y) |
or
1 |
SELECT ... FROM a WHERE func(a.x, a.y, a.z) |
Sometimes it may be possible to replace the function call with an identical expression that can use an index scan. A trivial example would be int4eq(x, 42)
, which could be replaced by x = 42
. Usually, though, the indexable expressions will not be able to replace the function call, but it can be useful as a “lossy” filter that significantly reduces the number of function calls that have to be performed.
One well-known example of such a lossy filter are LIKE
expressions:
1 2 3 4 5 6 7 8 |
EXPLAIN SELECT * FROM person WHERE name LIKE 'alb%t'; QUERY PLAN ------------------------------------------------------------------------------- Index Scan using person_name_idx on person (cost=0.29..8.31 rows=9 width=11) Index Cond: ((name ~>=~ 'alb'::text) AND (name ~<~ 'alc'::text)) Filter: (name ~~ 'alb%t'::text) (3 rows) |
The two (byte-wise) comparisons can use an index scan, and they narrow down the search space. An additional filter removes the false positives.
Up to PostgreSQL v11, the optimizer had this knowledge wired in. From v12 on, the functions that implement the LIKE
operator have support functions that contain this knowledge.
However, the main use case for this kind of support function will be PostGIS, and support functions were introduced specifically to help PostGIS. Up to now, functions like ST_Intersects()
or ST_DWithin()
used a trick to get index support: they were defined as SQL functions with a (lossy, but indexable) boundary box operator and an exact function. PostGIS relied on “function inlining” to get PostgreSQL to use an index. This was an ugly hack that caused problems, particularly with parallel queries. With PostgreSQL v12, PostGIS can use support functions to do this correctly.
With a “set-returning function”, PostgreSQL calls the support function to get an estimate for the number of rows. This has been implemented for unnest
in v12, so the example from the beginning will get the correct estimate:
1 2 3 4 5 6 |
EXPLAIN SELECT * FROM unnest(ARRAY[1,2,3]); QUERY PLAN ----------------------------------------------------------- Function Scan on unnest (cost=0.00..0.03 rows=3 width=4) (1 row) |
This provides a smarter alternative to the ROWS
clause of CREATE FUNCTION
.
Similar to the above, a support function can also provide a smarter alternative to the COST
clause of CREATE FUNCTION
.
There is no example of such a function if the PostgreSQL v12 code base, except in the regression tests, but maybe there will be more in future releases.
WHERE
conditionAs we saw before, a function that returns a boolean
can appear at the top level of a WHERE
condition. An example would be
1 |
SELECT ... FROM a WHERE starts_with(a.x, 'alb') |
Up to now, PostgreSQL had no idea how selective this condition is, so it simply estimated that it would filter out two thirds of the rows.
With PostgreSQL v12, you can define a support function that provides a better estimate for the selectivity of such a condition. Again, so far the only example for such a function is in the PostgreSQL regression tests.
This kind of support function is called when the optimizer simplifies constant expressions, so it could be used to replace a function call with a simpler expression if one or more of its arguments are constants.
For example, an expression like x + 0
(which internally calls the function int4pl
) could be replaced with x
.
PostgreSQL already had such a feature (called transform functions), but that was not exposed at the SQL level. The place where this was used in the code (simplification of some type casts) has been changed to use a support function in v12.
Support functions open the field for much better optimizer support for functions. I imagine that they will prove useful for PostgreSQL's built-in functions, as well as for third-party extensions.
There is a lot of low hanging fruit which might be harvested by beginners who want to get involved with PostgreSQL hacking:
generate_series
functions, generate_subscripts
, jsonb_populate_recordset
and other JSON functionsstarts_with
could be supported by indexes, and the selectivity estimates could be improved, quite similar to LIKE
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Wow! PostgreSQL keeps impressing me! Congratulations to the PG team.
You say "With PostgreSQL v12, you can define a support function that provides a better estimate for the selectivity of such a condition. Again, so far the only example for such a function is in the PostgreSQL regression tests." Not being an expert in the current source code, can you give a pointer to the function in the regression test please?
There is the
test_support_func
defined insrc/test/regress/regress.c
and insrc/test/regress/sql/misc_functions.sql
. It is used to define the functionmy_gen_series
in the latter SQL file.OK. I'm trying to write a function that allows a SupportRequestSelectivity, but the support function seems to be passed a SupportRequestSimplify, is this something that I can control what I get passed, or if I'm not passed a SupportRequestSelectivity am I simply out of luck?
No, that's fine. The optimizer does not know what your function supports. If you don't support simplifying the condition, simply return
NULL
. The optimizer may call you a second time with aSupportRequestSelectivity
.