We all know and value SQL functions as a handy shortcut. PostgreSQL v14 has introduced a new, better way to write SQL functions. This article will show the advantages of the new syntax.
Table of Contents
Let's create a simple example of an SQL function with the “classical” syntax so that we have some material for demonstrations:
1 2 3 4 5 |
CREATE EXTENSION unaccent; CREATE FUNCTION mangle(t text) RETURNS text LANGUAGE sql AS 'SELECT lower(unaccent(t))'; |
You can use the new function like other database functions:
1 2 3 4 5 6 |
SELECT mangle('Schön dumm'); mangle ════════════ schon dumm (1 row) |
You may ask what good an SQL function is. After all, the main purpose of a database function is to be able to run procedural code inside the database, something you cannot do with SQL. But SQL functions have their use:
CREATE AGGREGATE
or CREATE OPERATOR
Moreover, simple SQL functions can be inlined, that is, the optimizer can replace the function call with the function definition at query planning time. This can make SQL functions singularly efficient:
We can see function inlining if we use EXPLAIN (VERBOSE)
on our example function:
1 2 3 4 5 6 7 |
EXPLAIN (VERBOSE, COSTS OFF) SELECT mangle('Schön dumm'); QUERY PLAN ═══════════════════════════════════════════════ Result Output: lower(unaccent('Schön dumm'::text)) (2 rows) |
PostgreSQL functions are great. One of the nice aspects is that you are not restricted to a single programming language. Out of the box, PostgreSQL supports functions written in SQL, C, PL/pgSQL (a clone of Oracle's PL/SQL), Perl, Python and Tcl. But that is not all: in PostgreSQL, you can write a plugin that allows you to use any language of your choice inside the database. To allow that flexibility, the function body of a PostgreSQL function is simply a string constant that the call handler of the procedural language interprets when PostgreSQL executes the function. This has some undesirable side effects:
Usually, PostgreSQL tracks dependencies between database objects in the pg_depend
and pg_shdepend
catalog tables. That way, the database knows the relationships between objects: it will either prevent you from dropping objects on which other objects depend (like a table with a foreign key reference) or drop dependent objects automatically (like dropping a table drops all indexes on the table).
Since the body of a function is just a string constant that PostgreSQL cannot interpret, it won't track dependencies between a function and objects used in the function. A procedural language can provide a validator that checks the function body for syntactic correctness (if check_function_bodies = on
). The validator can also test if the objects referenced in the function exist, but it cannot keep you from later dropping an object used by the function.
Let's demonstrate that with our example:
1 2 3 4 5 6 7 8 9 |
DROP EXTENSION unaccent; SELECT mangle('boom'); ERROR: function unaccent(text) does not exist LINE 1: SELECT lower(unaccent(t)) ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. QUERY: SELECT lower(unaccent(t)) CONTEXT: SQL function 'mangle' during inlining |
We will fix the problem by creating the extension again. However, it would be better to get an error message when we run DROP EXTENSION
without using the CASCADE
option.
search_path
as a security problemSince PostgreSQL parses the function body at query execution time, it uses the current setting of search_path
to resolve all references to database objects that are not qualified with the schema name. That is not limited to tables and views, but also extends to functions and operators. We can use our example function to demonstrate the problem:
1 2 3 4 5 6 7 8 9 |
SET search_path = pg_catalog; SELECT public.mangle('boom'); ERROR: function unaccent(text) does not exist LINE 1: SELECT lower(unaccent(t)) ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. QUERY: SELECT lower(unaccent(t)) CONTEXT: SQL function 'mangle' during inlining |
In our example, it is a mere annoyance that we can avoid by using public.unaccent()
in the function call. But it can be worse than that, particularly with SECURITY DEFINER
functions. Since it is cumbersome to schema-qualify each function and operator, the recommended solution is to force a search_path
on the function:
1 |
ALTER FUNCTION mangle(text) SET search_path = public; |
Note that the schemas on the search_path
should allow CREATE
only to privileged users, so the above is not a good idea on versions older than v15!
An unpleasant downside of setting a search_path
is that it prevents the inlining of the SQL function.
From PostgreSQL v14 on, the body of SQL functions and procedures need no longer be a string constant. You can now use one of the following forms for the function body:
1 2 3 4 5 6 7 8 |
CREATE FUNCTION function_name(...) RETURNS ... RETURN expression; CREATE FUNCTION function_name(...) RETURNS ... BEGIN ATOMIC statement; ... END; |
The first form requires the function body to be an expression. So if you want to perform a query, you have to wrap it in parentheses (turning it into a subquery, which is a valid expression). For example:
1 2 |
CREATE FUNCTION get_data(v_id bigint) RETURNS text RETURN (SELECT value FROM data WHERE is = v_id); |
The second form allows you to write a function with more than one SQL statement. As it used to be with multi-statement SQL functions, the result of the function will be the result of the final SQL statement. You can also use the second form of the new syntax to create SQL procedures. The first form is obviously not suitable for a procedure, since procedures don't have a return value.
We can easily rewrite our example function to use the new syntax:
1 2 |
CREATE OR REPLACE FUNCTION mangle(t text) RETURNS text RETURN lower(unaccent(t)); |
Note that these new SQL functions can be inlined into SQL statements just like the old ones!
The main difference is that the new-style SQL functions and procedures are parsed at function definition time and stored in parsed form in the prosqlbody
column of the pg_proc
system catalog. As a consequence, the two shortcomings noted above are gone:
Because the function body is available in parsed form, PostgreSQL can track dependencies. Let's try that with our redefined example function:
1 2 3 4 |
DROP EXTENSION unaccent; ERROR: cannot drop extension unaccent because other objects depend on it DETAIL: function mangle(text) depends on function unaccent(text) HINT: Use DROP ... CASCADE to drop the dependent objects too. |
search_path
with new-style SQL functionssearch_path
is only relevant when SQL is parsed. Since this now happens when CREATE FUNCTION
runs, we don't have to worry about the current setting of that parameter at function execution time:
1 2 3 4 5 6 7 8 |
SET search_path = pg_catalog; SELECT public.mangle('Schön besser'); mangle ══════════════ schon besser (1 row) |
You may notice that the multi-statement form for defining SQL functions contains semicolons to terminate the SQL statements. That will not only confuse the usual suspects like HeidiSQL (which never learned dollar quoting), but it will be a problem for any client that recognizes semicolons as separator between SQL statements. Even older versions of psql
have a problem with that syntax:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
psql (13.7, server 15beta2) WARNING: psql major version 13, server major version 15. Some psql features might not work. Type 'help' for help. test=> CREATE FUNCTION tryme() RETURNS integer BEGIN ATOMIC SELECT 42; END; ERROR: syntax error at end of input LINE 3: SELECT 42; ^ WARNING: there is no transaction in progress COMMIT |
psql
thinks that the semicolon after “SELECT 42
” terminates the CREATE FUNCTION
statement. The truncated statement causes an error. The final END
is treated as its own statement, which is a synonym for COMMIT
and causes a warning.
In v14 and above, psql
handles such statements correctly. pgAdmin 4 has learned the new syntax with version 6.3. But I am sure that there are many clients out there that have not got the message yet.
The new syntax for SQL function introduced by PostgreSQL v14 has great advantages for usability and security. Get a client that supports the new syntax and start using it for your SQL functions. You should consider rewriting your existing functions to make use of these benefits.
Read another great post to increase your PostgreSQL syntax savvy: my post on Cross Join in PostgreSQL.
Find out more about how to get the most performance out of your PostgreSQL database with Hans' post on how to find and fix a missing index.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
I somehow missed it in the release notes, thanks for pointing out 🙂
With Datagrip's language injection feature it is easy to forget that a body of function is (was) just a string, and this comes with downsides.