A “materialized view” is a database object which stores the result of a precalculated database query and makes it easy to refresh this result as needed. Materialized views are an integral feature of pretty much all advanced database systems. Naturally, PostgreSQL also provides support for materialized views, and offers the end-user a powerful tool to handle more time-consuming requests.
Table of Contents
The main questions are now: What are the pitfalls, and how can you make use of materialized views in the first place? Let’s dive in and find out.
Before we can actually take a look at materialized views and figure out how they work, we have to import some sample data which we can use as the basis for our calculations:
1 2 3 4 5 6 7 8 |
demo=# CREATE TABLE t_demo (grp int, data numeric); CREATE TABLE demo=# INSERT INTO t_demo SELECT 1, random() FROM generate_series(1, 5000000); INSERT 0 5000000 demo=# INSERT INTO t_demo SELECT 2, random() FROM generate_series(1, 5000000); INSERT 0 5000000 |
We have created 10 million rows organized in 2 groups.
To create a materialized view in PostgreSQL, we can make use of the following syntax specification:
1 2 3 4 5 6 7 8 9 10 11 |
demo=# h CREATE MATERIALIZED VIEW Command: CREATE MATERIALIZED VIEW Description: define a new materialized view Syntax: CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] table_name [ (column_name [, ...] ) ] [ USING method ] [ WITH ( storage_parameter [= value] [, ... ] ) ] [ TABLESPACE tablespace_name ] AS query [ WITH [ NO ] DATA ] |
Basically, a materialized view has a name, some parameters, and is based on a query.
Here is an example of a materialized view:
1 2 3 4 5 |
demo=# CREATE MATERIALIZED VIEW mat_view AS SELECT grp, avg(data), count(*) FROM t_demo GROUP BY 1; SELECT 2 |
What is really important to note here is the size of the materialized view compared to the underlying table:
1 2 3 4 5 6 7 |
demo=# d+ List of relations Schema | Name | Type | Owner | Persistence | Size | Description -------+----------+-------------------+-------+-------------+--------+------------- public | mat_view | materialized view | hs | permanent | 16 kB | public | t_demo | table | hs | permanent | 423 MB | (2 rows) |
423 MB vs. 16 KB is a real difference. If you want to query the materialized view, you can do it just like you would query a normal PostgreSQL table:
1 2 3 4 5 6 7 8 9 10 11 12 |
demo=# SELECT * FROM mat_view; grp | avg | count -----+--------------------------+--------- 1 | 0.500091604165533141153 | 5000000 2 | 0.499893365960709086730 | 5000000 (2 rows) demo=# explain SELECT * FROM mat_view; QUERY PLAN ------------------------------------------------------------- Seq Scan on mat_view (cost=0.00..21.30 rows=1130 width=44) (1 row) |
Looking at the execution plan (explain
) shows that there is no difference between a normal table and a materialized view. We can simply query the data.
At this point, the only suboptimal thing is the query optimizer’s estimate (= 1130 rows). However, that can easily be fixed.
The process is the same as for any other table:
1 2 3 4 5 6 7 |
demo=# ANALYZE; ANALYZE demo=# explain SELECT * FROM mat_view; QUERY PLAN --------------------------------------------------------- Seq Scan on mat_view (cost=0.00..1.02 rows=2 width=26) (1 row) |
ANALYZE
has recreated the new optimizer statistics and fixed the estimate nicely. If you want to know more about ANALYZE
in general, we recommend our posts about PostgreSQL optimizer statistics and autovacuum. Autovacuum is also able to recreate optimizer statistics.
However, there is more. PostgreSQL offers some additional important features which are relevant in this field. One of those features is the USING
clause. What is the purpose of this one? At the moment, PostgreSQL supports only one storage format (heap). However, in the future, we hope to see more storage formats such as zheap or zedstore.
The idea of USING
is to have the ability to select the best storage format for the given requirement. At the moment, heap is the default format, and that’s perfectly fine.
The WITH
-clause is the same as in CREATE TABLE. It allows you to define storage parameters such as autovacuum behavior, FILLFACTOR
and so on.
Here’s an example:
1 2 3 4 5 6 |
demo=# CREATE MATERIALIZED VIEW mat_view_2 WITH (autovacuum_enabled = false) AS SELECT grp, avg(data), count(*) FROM t_demo GROUP BY 1; SELECT 2 |
In general, a materialized view is stored just like a table, so simply follow the same recommendations as you would follow when creating a standard table.
The same is true for the TABLESPACE
keyword. In the default setup, the materialized view will be stored in the default tablespace which is $PGDATA
:
1 2 3 4 5 |
demo=# SHOW data_directory; data_directory ------------------ /Users/hs//db13 (1 row) |
SHOW
is an easy way to figure out where that is. If you want to find out more about tablespaces, consider checking out our blog post about tablespaces in PostgreSQL.
By default, a materialized view is created in a way that it contains the result. However, if the object is created using the NO DATA
option, it is empty. Only the definition is created.
One has to be aware of the fact that in this case, the view cannot be queried until it has been refreshed to contain data. We have not seen the NO DATA
option being used too frequently in database support. It’s still worth mentioning that this option does exist.
Sometimes a materialized view has to be modified later on. A classical example would be to rename a column of the view without recreating it.
Let’s take a look at the definition of the view:
1 2 3 4 5 6 7 |
demo=# d mat_view Materialized view 'public.mat_view' Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- grp | integer | | | avg | numeric | | | count | bigint | | | |
Modifying a materialized view is done using the ALTER MATERIALIZED VIEW
command which is a powerful tool to facilitate all kinds of changes.
The syntax is pretty similar to what ALTER TABLE
can do for you:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
demo=# h ALTER MATERIALIZED VIEW Command: ALTER MATERIALIZED VIEW Description: change the definition of a materialized view Syntax: ALTER MATERIALIZED VIEW [ IF EXISTS ] name action [, ... ] ALTER MATERIALIZED VIEW name DEPENDS ON EXTENSION extension_name ALTER MATERIALIZED VIEW [ IF EXISTS ] name RENAME [ COLUMN ] column_name TO new_column_name ALTER MATERIALIZED VIEW [ IF EXISTS ] name RENAME TO new_name ALTER MATERIALIZED VIEW [ IF EXISTS ] name SET SCHEMA new_schema ALTER MATERIALIZED VIEW ALL IN TABLESPACE name [ OWNED BY role_name [, ... ] ] SET TABLESPACE new_tablespace [ NOWAIT ] where action is one of: ALTER [ COLUMN ] column_name SET STATISTICS integer ALTER [ COLUMN ] column_name SET ( attribute_option = value [, ... ] ) ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] ) ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } CLUSTER ON index_name SET WITHOUT CLUSTER SET ( storage_parameter [= value] [, ... ] ) RESET ( storage_parameter [, ... ] ) OWNER TO { new_owner | CURRENT_USER | SESSION_USER } |
What is noteworthy here is that you can also do things such as CLUSTER
(= index organize), SET STATISTICS
(= adjust size of histogram for the PostgreSQL optimizer) and so on. The process is the same as for a normal table.
Now, let’s try it out and rename a column:
1 2 3 4 5 6 7 8 9 10 |
demo=# ALTER MATERIALIZED VIEW mat_view RENAME COLUMN avg TO average; ALTER MATERIALIZED VIEW demo=# d mat_view Materialized view 'public.mat_view' Column | Type | Collation | Nullable | Default ---------+---------+-----------+----------+--------- grp | integer | | | average | numeric | | | count | bigint | | | |
Before we move on to refreshing, I want to focus your attention on indexing:
1 2 |
demo=# CREATE INDEX idx_average ON mat_view (average); CREATE INDEX |
It is a good idea to index a materialized view to ensure fast querying. Again, the same rules as for a table exist. Simply index entries which are diverse and offer good selectivity when the materialized view is read.
A materialized view contains a snapshot of the query result. It is not updated periodically, unless the user forces PostgreSQL to do so. In Oracle, materialized view support is a bit more sophisticated. Hopefully, PostgreSQL will soon catch up in this area a bit.
However, at the moment,a materialized view in PostgreSQL has to be refreshed.
Refresh manually using the REFRESH MATERIALIZED VIEW
command:
1 2 3 4 5 6 |
demo=# h REFRESH MATERIALIZED VIEW Command: REFRESH MATERIALIZED VIEW Description: replace the contents of a materialized view Syntax: REFRESH MATERIALIZED VIEW [ CONCURRENTLY ] name [ WITH [ NO ] DATA ] |
Running the refresh is simple:
1 2 |
demo=# REFRESH MATERIALIZED VIEW mat_view; REFRESH MATERIALIZED VIEW |
PostgreSQL knows the query which has to be re-executed to refresh the content of the materialized view. There is just one downside: PostgreSQL has to lock the object for the time of the refresh, which means that nobody can access it while it is refreshed.
Sometimes it is simply not acceptable to tolerate a locked materialized view. In that case, concurrent updates are needed. However, there is a problem:
1 2 3 4 |
demo=# REFRESH MATERIALIZED VIEW CONCURRENTLY mat_view; ERROR: cannot refresh materialized view 'public.mat_view' concurrently HINT: Create a unique index with no WHERE clause on one or more columns of the materialized view. |
The materialized view needs a unique column to support concurrent refreshing.
That means you have to determine a “primary key” and create a unique index:
1 2 3 4 |
demo=# CREATE UNIQUE INDEX idx_grp ON mat_view (grp); CREATE INDEX demo=# REFRESH MATERIALIZED VIEW CONCURRENTLY mat_view; REFRESH MATERIALIZED VIEW |
You need to be a bit careful and make sure that table bloat does not take over. Technically, it is a DELETE / INSERT
, which is prone to cause table bloat. Make sure a proper VACUUM
policy is in place.
Materialized views are an important feature in most databases, including PostgreSQL. They can help to speed up large calculations - or at least to cache them.
If you want to make sure that your materialized views are up to date and if you want to read more about PostgreSQL right now, check out our blog about pg_timetable which shows you how to schedule jobs in PostgreSQL. Why is pg_timetable so useful? Our scheduler makes sure that identical jobs cannot overlap, and that they simply don’t execute again, in case the same job is already running. In case of long jobs, this is super important - especially if you want to use materialized views.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Hi,
Excellent post, congratulations on your work.
Is there any change to be implemented in the future for a command like
REFRESH MATERIALIZED VIEW MyView as (SELECT * FROM MyView WHERE condition)
I would like to update only a few records.
Thank you so much
Claudio Oliveira.
MSI Soluções.
Hi,
what is the mechanism used for MV refresh concurrently, how postgres knows which rows should be deleted or updated in the materialized view?
thank you in advance for your details
Fabrice
That's exactly what the unique index is for. Without it, rows wouldn't have an identity, so you couldn't tell which row needs to be updated or deleted.