It is possible to write functions in PostgreSQL in almost any widespread language such as Perl, Python or C. In general this offers a great deal of flexibility and acceptable performance. However, in some cases customers might say: “We have the feeling that procedures are slow”. The thing is: PostgreSQL might not be to blame for this observation - in many cases it is simply “pilot error”. Let me try to elaborate on the key issue of function calls related to bad performance.
Table of Contents
As mentioned already, it is possible to write functions in basically any language. PostgreSQL simply passes the code of a function to the external language and takes back the result. In a way a function is a kind of black box - PostgreSQL rarely knows what is going on inside a procedure.
Here is an example of a function call:
1 2 3 4 5 6 7 8 9 |
test=# CREATE OR REPLACE FUNCTION mymax(int, int) RETURNS int AS $ BEGIN RETURN CASE WHEN $1 > $2 THEN $1 ELSE $2 END; END; $ LANGUAGE 'plpgsql'; CREATE FUNCTION |
1 2 3 4 5 |
test=# SELECT mymax(20, 30); mymax ------- 30 (1 row) |
The important thing here is: The PL/pgSQL function is a complete black box. The planner has no idea what the “external” language does in this case. This has some important implication.
1 2 3 4 5 6 |
test=# CREATE TABLE demo AS SELECT * FROM generate_series(1, 1000000) AS id; SELECT 1000000 test=# CREATE INDEX idx_id ON demo(id); CREATE INDEX |
The table is already large enough to consider indexes:
1 2 3 4 5 6 7 |
test=# explain SELECT * FROM demo WHERE id = 20; QUERY PLAN --------------------------------------------------------------- Index Only Scan using idx_id on demo (cost=0.42..8.44 rows=1 width=4) Index Cond: (id = 20) (2 rows) |
The problem is: The situation changes completely if we start to use the function I have just shown:
1 2 3 4 5 6 7 8 |
test=# explain SELECT * FROM demo WHERE id = mymax(20, 20); QUERY PLAN --------------------------------------------------------- Seq Scan on demo (cost=0.00..266925.00 rows=1 width=4) Filter: (id = mymax(20, 20)) (2 rows) |
PostgreSQL has no idea that the function will return 20. It is clear to humans, but nobody told the machine that this is guaranteed to happen. To PostgreSQL, the result of a function is considered to be “volatile” - anything can happen. Therefore, it cannot simply ask the index for the correct row. The first function call might not return the same as the second call - even if the parameters are identical. The optimizer has to play it safe and will go for a sequential scan, which is definitely going to produce the correct result.
In PostgreSQL a function can be:
• VOLATILE
• STABLE
• IMMUTABLE
If a function is marked as VOLATILE, it can return anything if you call it multiple times using the very same input parameters. In case of STABLE the function is going to return the same result given the same parameters within the same transaction.
The most prominent STABLE function is now()
, which will always return the same result within the same transaction:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
test=# SELECT now(); now ------------------------------- 2018-01-09 11:48:46.385457+01 (1 row) test=# BEGIN; BEGIN test=# SELECT now(); now ------------------------------- 2018-01-09 11:48:51.073123+01 (1 row) test=# SELECT now(); now ------------------------------- 2018-01-09 11:48:51.073123+01 (1 row) test=# COMMIT; COMMIT test=# SELECT now(); now ------------------------------- 2018-01-09 11:48:59.640697+01 (1 row) |
Some functions are even IMMUTABLE: In this case the result given the same input parameters will be constant, regardless of the transaction. Cosine would be an example of one of these function calls:
1 2 3 4 5 6 7 8 9 10 11 |
test=# SELECT cos(10), cos(20); cos | cos --------------------+------------------- -0.839071529076452 | 0.408082061813392 (1 row) test=# SELECT cos(10), cos(20); cos | cos --------------------+------------------- -0.839071529076452 | 0.408082061813392 (1 row) |
Even in the second transaction the cosine of a number will be the same.
To fix our problem we have to drop the existing function …
1 2 |
test=# DROP FUNCTION mymax(int, int); DROP FUNCTION |
… and recreate it:
1 2 3 4 5 6 7 8 9 |
test=# CREATE OR REPLACE FUNCTION mymax(int, int) RETURNS int AS $ BEGIN RETURN CASE WHEN $1 > $2 THEN $1 ELSE $2 END; END; $ LANGUAGE 'plpgsql' IMMUTABLE; CREATE FUNCTION |
The planner will detect that the function is supposed to return a static value given the same input values and go for an index scan:
1 2 3 4 5 6 7 8 9 |
test=# explain SELECT * FROM demo WHERE id = mymax(20, 20); QUERY PLAN ------------------------------------------------------ Index Only Scan using idx_id on demo (cost=0.42..8.44 rows=1 width=4) Index Cond: (id = 20) (2 rows) |
Of course the index scan orders magnitude faster and will return in basically no time.
Fortunately PostgreSQL has a system view, which might shed some light on functions, that could be a problem. The first thing you have to do is to set “track_functions = 'all' “ - it will tell PostgreSQL to collection function statistics:
1 2 3 4 5 |
test=# SELECT * FROM demo WHERE id = mymax(20, 20); id ---- 20 (1 row) |
Once this setting has been turned on, pg_stat_user_functions will contain valuable information:
1 2 3 4 5 |
test=# SELECT * FROM pg_stat_user_functions ; funcid | schemaname | funcname | calls | total_time | self_time --------+------------+----------+-------+------------+----------- 16429 | public | mymax | 1 | 0.025 | 0.025 (1 row) |
If you happen to see that a function is called insanely often, it can make sense to inspect it and check, if it happens to be VOLATILE for no good reason. Changing the function definition can significantly improve speed.
If you want to create indexes on a function, you have to make sure that the function itself is in fact IMMUTABLE. Otherwise PostgreSQL won't create the index for you. The reason is simple: PostgreSQL has to make sure that the content of the index is stable and does not have to be changed over time if the underlying data is unchanged.
Read the latest blogs about the PostgreSQL optimizer, or find out more about fixing slow queries.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Your posts are great, full of practical tips. Thanks so much!