This blog is about table functions and performance. PostgreSQL is a really powerful database and offers many features to make SQL even more powerful. One of these impressive things is the concept of a composite data type. In PostgreSQL a column can be a fairly complex thing. This is especially important if you want to work with server side stored procedures or functions. However, there are some details people are usually not aware of when making use of stored procedures and composite types.
Table of Contents
Before we dive into the main topic of this post, I want to give you a mini introduction to composite types in general:
1 2 3 4 5 6 7 8 9 10 11 |
test=# CREATE TYPE person AS (id int, name text, income numeric); CREATE TYPE I have created a simple data type to store persons. The beauty is that the composite type can be seen as one column: test=# SELECT '(10, 'hans', 500)'::person; person ------------------ (10,' hans',500) (1 row) |
However, it is also possible to break it up again and represent it as a set of fields.
1 2 3 4 5 |
test=# SELECT ('(10, 'hans', 500)'::person).*; id | name | income ----+------+-------- 10 | hans | 500 (1 row) |
1 2 3 4 5 6 7 8 |
test=# CREATE TABLE data (p person, gender char(1)); CREATE TABLE test=# d data Table 'public.data' Column | Type | Collation | Nullable | Default --------+--------------+-----------+----------+--------- p | person | | | gender | character(1) | | | |
As you can see the column type is “person”.
Armed with this kind of information we can focus our attention on performance. In PostgreSQL a composite type is often used in conjunction with stored procedures to abstract values passed to a function or to handle return values.
Why is that important? Let me create a type containing 3 million entries:
1 2 3 4 5 6 7 |
test=# CREATE TABLE x (id int); CREATE TABLE test=# INSERT INTO x SELECT * FROM generate_series(1, 3000000); INSERT 0 3000000 test=# vacuum ANALYZE ; VACUUM |
pgstattuple is an extension which is especially useful if you want to detect bloat in a table. It makes use of a composite data type to return data. Installing the extension is easy:
1 2 |
test=# CREATE EXTENSION pgstattuple; CREATE EXTENSION |
What we want to do next is to inspect the content of “x” and see the data (all fields). Here is what you can do:
1 2 3 4 5 6 7 |
test=# explain analyze SELECT (pgstattuple('x')).*; QUERY PLAN ------------------------------------------------------------------------------------------- Result (cost=0.00..0.03 rows=1 width=72) (actual time=1909.217..1909.219 rows=1 loops=1) Planning Time: 0.016 ms Execution Time: 1909.279 ms (3 rows) |
Wow, it took close to 2 seconds to generate the result. Why is that the case? Let us take a look at a second example:
1 2 3 4 5 6 7 8 |
test=# explain analyze SELECT * FROM pgstattuple('x'); QUERY PLAN ---------------------------------------------------------------------------------------------- Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual time=212.056..212.057 rows=1 loops=1) Planning Time: 0.019 ms Execution Time: 212.093 ms (3 rows) |
Ooops? What happened? If we put the query in the FROM-clause the database is significantly faster. The same is true if we use a subselect:
1 2 3 4 5 6 7 |
test=# explain analyze SELECT (y).* FROM (SELECT pgstattuple('x') ) AS y; QUERY PLAN ----------------------------------------------------------------------------------------- Result (cost=0.00..0.01 rows=1 width=32) (actual time=209.666..209.666 rows=1 loops=1) Planning Time: 0.034 ms Execution Time: 209.698 ms (3 rows) |
Let us analyze the reasons for this behavior!
The problem is that PostgreSQL expands the FROM-clause. It actually turns (pgstattuple('x')) into …
1 2 3 4 5 6 |
… (pgstattuple('x')).table_len, (pgstattuple('x')).tuple_count, (pgstattuple('x')).tuple_len, (pgstattuple('x')).tuple_percent, … |
As you can see, the function is called more often in this case which of course explains the runtime difference. Therefore it makes a lot of sense to understand what is going on under the hood here. The performance improvement can be quite dramatic. We have seen a couple of cases in PostgreSQL support recently which could be related to this kind of behavior.
If you want to know more about performance consider checking out my blog post about CREATE INDEX and parallelism.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply