Partial aggregation: The beautiful way

02.2015 | Category: How To | Tags: development, sql help

02.2015

Category: How To

Partial aggregation: Aggregations are a fundamental feature of SQL. Functions such as min, max, count, sum, avg and so on are used in virtually every application - it is basically impossible to write a reasonably complex applications without them.

Here is a trivial aggregate: All it does is summing up numbers from 1 to 10:

test=# SELECT sum(x) FROM generate_series(1, 10) AS x;
 sum
-----
  55
(1 row)

1

2

3

4

5

test=# SELECT sum(x) FROM generate_series(1, 10) AS x;

sum

-----

55

(1 row)

So far so good. But what if we only want to sum up even numbers? We can rely on a simple trick: NULL values are ignored by aggregation functions. This makes a lot of sense because NULL really means “undefined” and you cannot average or sum up undefined values. A simple CASE WHEN can do the job:

test=# SELECT sum(CASE WHEN x % 2 = 0 THEN x ELSE NULL END)
            FROM generate_series(1, 10) AS x;
 sum
-----
  30
(1 row)

1

2

3

4

5

6

test=# SELECT sum(CASE WHEN x % 2 = 0 THEN x ELSE NULL END)

FROM generate_series(1, 10) AS x;

sum

-----

30

(1 row)

This works perfectly - however, it is a bit old fashioned and pretty nasty to write (especially in case of more complex operations).

The fancy way to partial aggregation

There is a more fancy way to do that:

test=# SELECT     sum(x) FILTER (WHERE x % 2 = 0) AS even,
                  sum(x) FILTER (WHERE x % 2 = 1) AS odd
       FROM  generate_series(1, 10) AS x;
 even | odd
------+-----
   30 |  25
(1 row)

1

2

3

4

5

6

7

test=# SELECT sum(x) FILTER (WHERE x % 2 = 0) AS even,

sum(x) FILTER (WHERE x % 2 = 1) AS odd

FROM generate_series(1, 10) AS x;

even | odd

------+-----

30 | 25

(1 row)

The FILTER keywords allow users to tell the systems which rows should make it into the aggregate function. The new syntax is far more readable, as well as a lot shorter.

Find out more about partial aggregation in PostgreSQL 16 in this blog post about Parallel Aggregates by Pavlo Golub.

In case you need any assistance, please feel free to contact us.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

7 responses to “Partial aggregation: The beautiful way”

Efi says:

February 17, 2015 at 4:46 pm

Is there any difference performance wise between the FILTER version and the CASE WHEN version?

Reply
- David Fetter says:
  
  February 17, 2015 at 7:05 pm
  
  Clarity and speed. The planner knows about FILTER, so it has what it needs to make it faster.
  
  Reply
  - Slava Moudry says:
    
    February 18, 2015 at 8:52 pm
    
    Thanks! This is really great. David, can you elaborate a bit on planner optimizations for FILTER.
    Would it use data stats on columns or can it pull data using index-only scan?
    
    Reply
    - David Fetter says:
      
      February 18, 2015 at 9:14 pm
      
      FILTER is about eliminating, not adding. See the patch for places where that happens. http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b560ec1b0d7b910ce13edc51ffaafaca72136e3b
      
      Reply
    - David Fetter says:
      
      February 19, 2015 at 6:23 pm
      
      Basically, FILTER is a way of removing rows efficiently compared to evaluating a CASE at each row. The technical details of this are here: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b560ec1b0d7b910ce13edc51ffaafaca72136e3b
      
      Reply
    - David Fetter says:
      
      February 24, 2015 at 3:10 pm
      
      Check the git commit logs for FILTER. The initial mechanics of the optimization are there.
      
      Reply
David Fetter says:

February 17, 2015 at 7:05 pm

Thanks for promoting the feature I wrote 🙂

Reply

Partial aggregation: The beautiful way

The fancy way to partial aggregation

7 responses to “Partial aggregation: The beautiful way”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts