“How does the PostgreSQL optimizer handle views?” or “Are views good or bad?” I assume that every database consultant and every SQL performance expert has heard this kind of question already. Given the fact that views are a really essential feature of SQL, it makes sense to take a closer look at the topic in general, and hopefully help some people to write better and faster code.
Table of Contents
Let us create a simple table containing just 10 rows, which can be used throughout the blog to show how PostgreSQL works and how the optimizer treats things:
1 2 3 4 5 6 7 8 9 10 11 12 |
test=# CREATE TABLE data AS SELECT * FROM generate_series(1, 10) AS id; SELECT 10 Then I have created a very simple view: test=# CREATE VIEW v1 AS SELECT * FROM data WHERE id < 4; CREATE VIEW |
The idea here is to filter some data and return all the columns.
The key aspect is: The optimizer will process the view just like a “pre-processor” directive. It will try to inline the code and to flatten it. Here is an example:
1 2 3 4 5 6 |
test=# explain SELECT * FROM v1; QUERY PLAN ------------------------------------------------------- Seq Scan on data (cost=0.00..41.88 rows=850 width=4) Filter: (id < 4) (2 rows) |
When we try to read from the view it is just like running the SQL statement directly. The optimizer will perform the following steps:
1 2 3 4 5 |
SELECT * FROM (SELECT * FROM data WHERE id < 4 ) AS v1; |
In the next step the subselect will be flattened out completely, which leaves us with:
1 |
SELECT * FROM data WHERE id < 4; |
We could have done this transformation by ourselves, but it is easier to make the optimizer do it for us and enjoy the convenience offered by the view.
Basically the same mechanism applies, when you are working with joins. PostgreSQL will again inline, flatten and then create a plan to join those tables together. The output is therefore not surprising:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
test=# explain SELECT * FROM v1 AS a, v1 AS b WHERE a.id = 1 AND a.id = b.id; QUERY PLAN ------------------------------------------------------------------------ Nested Loop (cost=0.00..96.71 rows=16 width=8) -> Seq Scan on data (cost=0.00..48.25 rows=4 width=4) Filter: ((id < 4) AND (id = 1)) -> Materialize (cost=0.00..48.27 rows=4 width=4) -> Seq Scan on data data_1 (cost=0.00..48.25 rows=4 width=4) Filter: ((id < 4) AND (id = 1)) (6 rows) |
Again PostgreSQL provides us with a simple plain.
However, views are not only good. Sometimes they can be a little destructive too. Consider the following example:
1 2 3 4 5 6 |
test=# CREATE VIEW v2 AS SELECT * FROM data WHERE id < 4 ORDER BY 1; CREATE VIEW |
In this case the view provides us with some ordered data. Let us see what happens in the following query:
1 2 3 4 5 6 7 8 9 10 |
test=# explain SELECT * FROM v2 ORDER BY 1 DESC; QUERY PLAN ------------------------------------------------------------------- Sort (cost=135.22..137.34 rows=850 width=4) Sort Key: data.id DESC -> Sort (cost=83.23..85.36 rows=850 width=4) Sort Key: data.id -> Seq Scan on data (cost=0.00..41.88 rows=850 width=4) Filter: (id < 4) (6 rows) |
As you can see PostgreSQL will sort the data twice, which is definitely not good for performance. PostgreSQL won't optimize this for you, so you have to take action yourself and be very aware of the issue.
In general it is a good idea to make a mental distinction between two types of views:
• Final results
• Intermediate results
If a view already has the “final” results, you can do sorting and all kinds of stuff. However, if you are using a view to perform some large calculations on top of it, it might be a good idea to avoid formatting, sorting and alike.
However, there is more to views. Consider the following example:
1 2 3 4 5 6 |
test=# CREATE VIEW v3 AS SELECT * FROM data WHERE id < 4 LIMIT 100000000000; CREATE VIEW |
In this case the view contains a LIMIT clause, which basically says that “all” the data will be returned. However, during those formal transformation the optimizer cannot “rely” on the fact that all data will be included in the final result. Therefore the optimizer cannot silently skip the LIMIT clause, which of course leads to the following plan:
1 2 3 4 5 6 7 8 9 |
test=# explain SELECT * FROM v3 ORDER BY 1 DESC; QUERY PLAN ------------------------------------------------------------------- Sort (cost=91.73..93.86 rows=850 width=4) Sort Key: data.id DESC -> Limit (cost=0.00..41.88 rows=850 width=4) -> Seq Scan on data (cost=0.00..41.88 rows=850 width=4) Filter: (id < 4) (5 rows) |
As always the world is neither only black nor white. Views are a valuable feature and when used nicely there is nothing wrong with them. However, keep in mind what is going on behind the scenes and avoid building views on top of view on top of views and so on. Here is the documentation on query planning and the optimizer, for reference.
Learn more about query tuning, see our latest blogs on the topic.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply