CYBERTEC PostgreSQL Logo

Parallel aggregate - PostgreSQL 16 - better performance

04.2023 / Category: , / Tags: |

What is a parallel aggregate?

In PostgreSQL, a parallel aggregate refers to a way of processing aggregate functions (such as SUM, AVG, MAX, MIN, etc.) on large amounts of data in a parallel and distributed manner, thereby making the query execution faster.

When executing an aggregate query, the database system automatically breaks up the result set into smaller pieces and distributes the work among the available resources, and then combines the results to produce the final output. This approach can significantly improve the performance of aggregate queries on large datasets, but it requires sufficient resources and may not always be faster than serial execution.

Not all aggregate functions support so-called "Partial Mode", which indicates the aggregate is eligible to participate in various optimizations, such as parallel aggregation. And that was true for array_agg() and string_agg().

The first one collects all the input values, including nulls, into an array, while the second concatenates the non-null input values into a string.

What's new in PostgreSQL 16?

David Rowley implemented a new functionality for parallel aggregates on string_agg() and array_agg() functions. The patch was reviewed by Andres Freund, Tomas Vondra, Stephen Frost and Tom Lane. Committed by David Rowley. The commit message is:

Show time for new parallel aggregate functionality

And to be absolutely sure, let's try the same test on a PostreSQL 13 cluster. The EXPLAIN output will be the same for PostgreSQL 14 and 15.

We cannot compare timing but we can compare costs. As you can see, the final cost in the upcoming PostgreSQL 16 with parallel aggregate implemented is 118.02! The final execution cost for previous versions was 185.70.

Let me remind you what the cost means in the PostgreSQL's EXPLAIN output. Cost is an estimated measure of the query execution time in arbitrary units, representing the processing power required to execute a particular step in the query plan. It generally represents a combination of CPU, I/O, and memory usage and helps the query planner to choose the fastest execution plan.

Finally

See "Handling Bonus Programs in SQL" blog post by Hans-Jürgen Schönig to check how to efficiently code bonus programs in SQL. There is a real life example of using array_agg aggregate function. Or check out my "ER diagrams with SQL and Mermaid" post to see an example on how to use string_agg() function.

Check out out the “what’s new” post series to know more about upcoming PostgreSQL versions.


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram