Before we get started, I want to introduce my favorite set-returning functions which can help you to generate sample data:
Table of Contents
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
test=# SELECT * FROM generate_series(1, 10) AS x; x ---- 1 2 3 4 5 6 7 8 9 10 (10 rows) |
All we do here is to generate a list from 1 to 10 and print it on the screen. Let us play around with window functions a bit now: There are two cases we need to keep in mind. If the OVER-clause is empty it means that the entire data set is used. If we use ORDER BY, it is only the data set up to the current row in the sorted list. The following listing contains an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
test=# SELECT *, array_agg(x) OVER (), array_agg(x) OVER (ORDER BY x) FROM generate_series(1, 10) AS x; x | array_agg | array_agg ----+------------------------+------------------------ 1 | {1,2,3,4,5,6,7,8,9,10} | {1} 2 | {1,2,3,4,5,6,7,8,9,10} | {1,2} 3 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3} 4 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4} 5 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5} 6 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6} 7 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7} 8 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8} 9 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8,9} 10 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8,9,10} (10 rows) |
As you can see, the last column keeps accumulating more values.
Often it is necessary to limit the set of data (the window) used by the window function. ROWS BETWEEN … PRECEDING … AND … FOLLOWING allows you to do exactly that. The following example shows how this works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
test=# SELECT *, array_agg(x) OVER (ORDER BY x ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM generate_series(1, 10) AS x; x | array_agg ----+----------- 1 | {1,2} 2 | {1,2,3} 3 | {2,3,4} 4 | {3,4,5} 5 | {4,5,6} 6 | {5,6,7} 7 | {6,7,8} 8 | {7,8,9} 9 | {8,9,10} 10 | {9,10} (10 rows) |
What you see is that the data fed to array_agg is seriously restricted. But the restriction we are using here is a static one. The constants are hardwired. In some cases, you might need more flexibility.
More often than not, configuration has to be determined on the fly. The beauty is that in PostgreSQL you can use a subselect as part of the OVER-clause, which gives you a lot of flexibility.
Before we move on to a demo, we need to create a configuration table:
1 2 3 4 |
test=# CREATE TABLE t_config (key text, val int); CREATE TABLE test=# INSERT INTO t_config VALUES ('before', 1), ('after', 2); INSERT 0 2 |
To make it simple, I've simply created two entries. The following SELECT statement uses those configuration parameters to do its magic. Here is how it works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
test=# SELECT *, array_agg(x) OVER (ORDER BY x ROWS BETWEEN (SELECT val FROM t_config WHERE key = 'before') PRECEDING AND (SELECT val FROM t_config WHERE key = 'after') FOLLOWING) FROM generate_series(1, 10) AS x; x | array_agg ----+------------ 1 | {1,2,3} 2 | {1,2,3,4} 3 | {2,3,4,5} 4 | {3,4,5,6} 5 | {4,5,6,7} 6 | {5,6,7,8} 7 | {6,7,8,9} 8 | {7,8,9,10} 9 | {8,9,10} 10 | {9,10} (10 rows) |
As you can see, the query performs as expected and can be configured dynamically.
Another important note: PARTITION BY can take not only a column, but also an expression, to split the data set. Many people are not aware of this feature, which is actually quite useful. Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
test=# SELECT *, array_agg(x) OVER (PARTITION BY x % 2) FROM generate_series(1, 10) AS x; x | array_agg ----+-------------- 10 | {10,2,4,6,8} 2 | {10,2,4,6,8} 4 | {10,2,4,6,8} 6 | {10,2,4,6,8} 8 | {10,2,4,6,8} 9 | {9,7,3,1,5} 7 | {9,7,3,1,5} 3 | {9,7,3,1,5} 1 | {9,7,3,1,5} 5 | {9,7,3,1,5} (10 rows) |
In this case, we had no problem splitting the data into odd and even numbers. What I want to point out here is that PostgreSQL offers a lot of flexibility. We encourage you to test it out for yourself.
Window functions are super important if you need to relate the rows in a result set to each other. You order them, you partition them, and then you define a window from which you can compute additional result columns.
Sometimes, you want to find out more about a timeseries. One thing we have seen quite often recently is to count how often somebody was active for a certain amount of time. “Detecting continuous periods of activity” will show you how to calculate these things in PostgreSQL easily.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply