“If we had this data what would it mean?” - these kinds of questions can be answered using plain SQL. The technique you will need in PostgreSQL is a “hypothetical aggregate” which is of course part of the ANSI SQL standard. This post will show what an hypothetical aggregate is good for and how it works.
Table of Contents
Just like min, max, sum, count, and so on a hypothetical aggregate is used in conjunction with a GROUP BY clause. The GROUP BY will split the data into small portions and the aggregation will be applied on each of those groups. To show how this works I have created 10 rows and split them into “even” and “off” numbers. The way to do it is “modulo 2”. Here is an example:
1 2 3 4 5 6 7 |
test=# SELECT x % 2 AS grp, array_agg(x) FROM generate_series(1, 10) AS x GROUP BY x % 2; grp | array_agg -----+-------------- 1 | {1,3,5,7,9} 0 | {2,4,6,8,10} (2 rows) |
As you can see PostgreSQL has stuffed all the data into two arrays. There is nothing fancy about it. We can use any other aggregate such as sum, avg, min, max, etc. However, I used array_agg to show you which values are taken into consideration to calculate the aggregate here.
The main question now is: What is a hypothetical aggregate? Suppose we take the list of data in each group sort the values. Then we assume a hypothetical value (“as if it was there”) and see at which position it would end up. Here is an example:
1 2 3 4 5 6 7 8 9 |
test=# SELECT x % 2 AS grp, array_agg(x), rank(3.5) WITHIN GROUP (ORDER BY x) FROM generate_series(1, 10) AS x GROUP BY x % 2; grp | array_agg | rank -----+--------------+------ 0 | {10,2,4,6,8} | 2 1 | {9,7,3,1,5} | 3 (2 rows) |
If we take even numbers (2, 4, 6, 8, 10) in sorted 3.5 would be the 2nd entry. If we take odd numbers and sort them 3.5 would be the third value in the list. This is exactly what an hypothetical aggregate does.
Why would anybody want to use hypothetical aggregates in PostgreSQL? Here is an example: Suppose there is a sports event, and somebody is on the racing track. You want to know: If this racer reaches the finish line in 54 minutes, is he going to win? Finish as 10th? Or maybe last? A hypothetical aggregate will tell you that before the final score is inserted.
There are many more uses cases, but ranking is by far the most common and most widespread one.
If you want to know more about advanced SQL check out what we have to say about other fancy SQL stuff. Here is a post about speeding up aggregations by changing column orders.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
And we assume the order of array_agg elements has changed because of the quicksort/binary search?
GroupAggregate (cost=62.33..75.83 rows=200 width=44) (actual time=0.060..0.070 rows=2 loops=1)
Group Key: ((x % 2))
-> Sort (cost=62.33..64.83 rows=1000 width=8) (actual time=0.022..0.024 rows=10 loops=1)
Sort Key: ((x % 2))
Sort Method: quicksort Memory: 25kB
-> Function Scan on generate_series x (cost=0.00..12.50 rows=1000 width=8) (actual time=0.010..0.012 rows=10 loops=1)
Planning Time: 0.081 ms
Execution Time: 0.122 ms