CYBERTEC PostgreSQL Logo

Better SQL functions in PostgreSQL v14

07.2022 / Category: / Tags: | |

We all know and value SQL functions as a handy shortcut. PostgreSQL v14 has introduced a new, better way to write SQL functions. This article will show the advantages of the new syntax.

An example of an SQL function

Let's create a simple example of an SQL function with the “classical” syntax so that we have some material for demonstrations:

You can use the new function like other database functions:

Why SQL functions?

You may ask what good an SQL function is. After all, the main purpose of a database function is to be able to run procedural code inside the database, something you cannot do with SQL. But SQL functions have their use:

  • code reuse for expressions frequently used in different SQL statements
  • to make SQL statements more readable by factoring out part of the code into a function with a meaningful name
  • whenever you need a function for syntactical reasons, like in CREATE AGGREGATE or CREATE OPERATOR

Moreover, simple SQL functions can be inlined, that is, the optimizer can replace the function call with the function definition at query planning time. This can make SQL functions singularly efficient:

  • it removes the overhead of an actual function call
  • since functions are (mostly) black boxes to the optimizer, replacing the function with its definition usually gives you better estimates

We can see function inlining if we use EXPLAIN (VERBOSE) on our example function:

Shortcomings of PostgreSQL functions

PostgreSQL functions are great. One of the nice aspects is that you are not restricted to a single programming language. Out of the box, PostgreSQL supports functions written in SQL, C, PL/pgSQL (a clone of Oracle's PL/SQL), Perl, Python and Tcl. But that is not all: in PostgreSQL, you can write a plugin that allows you to use any language of your choice inside the database. To allow that flexibility, the function body of a PostgreSQL function is simply a string constant that the call handler of the procedural language interprets when PostgreSQL executes the function. This has some undesirable side effects:

Lack of dependency tracking

Usually, PostgreSQL tracks dependencies between database objects in the pg_depend and pg_shdepend catalog tables. That way, the database knows the relationships between objects: it will either prevent you from dropping objects on which other objects depend (like a table with a foreign key reference) or drop dependent objects automatically (like dropping a table drops all indexes on the table).

Since the body of a function is just a string constant that PostgreSQL cannot interpret, it won't track dependencies between a function and objects used in the function. A procedural language can provide a validator that checks the function body for syntactic correctness (if check_function_bodies = on). The validator can also test if the objects referenced in the function exist, but it cannot keep you from later dropping an object used by the function.

Let's demonstrate that with our example:

We will fix the problem by creating the extension again. However, it would be better to get an error message when we run DROP EXTENSION without using the CASCADE option.

search_path as a security problem

Since PostgreSQL parses the function body at query execution time, it uses the current setting of search_path to resolve all references to database objects that are not qualified with the schema name. That is not limited to tables and views, but also extends to functions and operators. We can use our example function to demonstrate the problem:

In our example, it is a mere annoyance that we can avoid by using public.unaccent() in the function call. But it can be worse than that, particularly with SECURITY DEFINER functions. Since it is cumbersome to schema-qualify each function and operator, the recommended solution is to force a search_path on the function:

Note that the schemas on the search_path should allow CREATE only to privileged users, so the above is not a good idea on versions older than v15!

An unpleasant downside of setting a search_path is that it prevents the inlining of the SQL function.

The new SQL function syntax in PostgreSQL v14

From PostgreSQL v14 on, the body of SQL functions and procedures need no longer be a string constant. You can now use one of the following forms for the function body:

The first form requires the function body to be an expression. So if you want to perform a query, you have to wrap it in parentheses (turning it into a subquery, which is a valid expression). For example:

The second form allows you to write a function with more than one SQL statement. As it used to be with multi-statement SQL functions, the result of the function will be the result of the final SQL statement. You can also use the second form of the new syntax to create SQL procedures. The first form is obviously not suitable for a procedure, since procedures don't have a return value.

We can easily rewrite our example function to use the new syntax:

Note that these new SQL functions can be inlined into SQL statements just like the old ones!

Advantages of the new SQL function syntax

The main difference is that the new-style SQL functions and procedures are parsed at function definition time and stored in parsed form in the prosqlbody column of the pg_proc system catalog. As a consequence, the two shortcomings noted above are gone:

Dependency tracking with new-style SQL functions

Because the function body is available in parsed form, PostgreSQL can track dependencies. Let's try that with our redefined example function:

Fixed search_path with new-style SQL functions

search_path is only relevant when SQL is parsed. Since this now happens when CREATE FUNCTION runs, we don't have to worry about the current setting of that parameter at function execution time:

Problems with interactive clients

You may notice that the multi-statement form for defining SQL functions contains semicolons to terminate the SQL statements. That will not only confuse the usual suspects like HeidiSQL (which never learned dollar quoting), but it will be a problem for any client that recognizes semicolons as separator between SQL statements. Even older versions of psql have a problem with that syntax:

psql thinks that the semicolon after “SELECT 42” terminates the CREATE FUNCTION statement. The truncated statement causes an error. The final END is treated as its own statement, which is a synonym for COMMIT and causes a warning.

In v14 and above, psql handles such statements correctly. pgAdmin 4 has learned the new syntax with version 6.3. But I am sure that there are many clients out there that have not got the message yet.

Conclusion

The new syntax for SQL function introduced by PostgreSQL v14 has great advantages for usability and security. Get a client that supports the new syntax and start using it for your SQL functions. You should consider rewriting your existing functions to make use of these benefits.

Read another great post to increase your PostgreSQL syntax savvy: my post on Cross Join in PostgreSQL.

Find out more about how to get the most performance out of your PostgreSQL database with Hans' post on how to find and fix a missing index.

One response to “Better SQL functions in PostgreSQL v14”

  1. I somehow missed it in the release notes, thanks for pointing out 🙂
    With Datagrip's language injection feature it is easy to forget that a body of function is (was) just a string, and this comes with downsides.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram