Partial aggregation: Aggregations are a fundamental feature of SQL. Functions such as min, max, count, sum, avg and so on are used in virtually every application - it is basically impossible to write a reasonably complex applications without them.
Here is a trivial aggregate: All it does is summing up numbers from 1 to 10:
1 2 3 4 5 |
test=# SELECT sum(x) FROM generate_series(1, 10) AS x; sum ----- 55 (1 row) |
So far so good. But what if we only want to sum up even numbers? We can rely on a simple trick: NULL values are ignored by aggregation functions. This makes a lot of sense because NULL really means “undefined” and you cannot average or sum up undefined values. A simple CASE WHEN can do the job:
1 2 3 4 5 6 |
test=# SELECT sum(CASE WHEN x % 2 = 0 THEN x ELSE NULL END) FROM generate_series(1, 10) AS x; sum ----- 30 (1 row) |
This works perfectly - however, it is a bit old fashioned and pretty nasty to write (especially in case of more complex operations).
There is a more fancy way to do that:
1 2 3 4 5 6 7 |
test=# SELECT sum(x) FILTER (WHERE x % 2 = 0) AS even, sum(x) FILTER (WHERE x % 2 = 1) AS odd FROM generate_series(1, 10) AS x; even | odd ------+----- 30 | 25 (1 row) |
The FILTER keywords allow users to tell the systems which rows should make it into the aggregate function. The new syntax is far more readable, as well as a lot shorter.
Find out more about partial aggregation in PostgreSQL 16 in this blog post about Parallel Aggregates by Pavlo Golub.
In case you need any assistance, please feel free to contact us.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Is there any difference performance wise between the FILTER version and the CASE WHEN version?
Clarity and speed. The planner knows about FILTER, so it has what it needs to make it faster.
Thanks! This is really great. David, can you elaborate a bit on planner optimizations for FILTER.
Would it use data stats on columns or can it pull data using index-only scan?
FILTER is about eliminating, not adding. See the patch for places where that happens. http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b560ec1b0d7b910ce13edc51ffaafaca72136e3b
Basically, FILTER is a way of removing rows efficiently compared to evaluating a CASE at each row. The technical details of this are here: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b560ec1b0d7b910ce13edc51ffaafaca72136e3b
Check the git commit logs for FILTER. The initial mechanics of the optimization are there.
Thanks for promoting the feature I wrote 🙂