CYBERTEC PostgreSQL Logo

Sort Performance in PostgreSQL

08.2018 / Category: / Tags:

Sorting is a very important aspect of PostgreSQL performance tuning. However, improving sort performance is often misunderstood or simply overlooked by many people. So, I decided to come up with a PostgreSQL blog showing how sorts can be tuned in PostgreSQL.

Creating sample data

To show how sorting works, I created a couple of million rows first

What the code does is to create a table and load 5 million random values. As you will notice, data can be loaded within seconds.

Sort performance - Sorting data in PostgreSQL

Let us try to sort the data. To keep things simple, I am using the most uncomplicated statements possible. What you can see is that PostgreSQL has to sort on disk because the data we want to sort does not fit into memory. In this case a bit more than 100 MB of data is moved to disk:

Why does PostgreSQL not simply sort stuff in memory? The reason is the work_mem parameter, which is by default set to 4 MB:

work_mem tells the server that up to 4 MB can be used per operation (per sort, grouping operation, etc.). If you sort too much data, PostgreSQL has to move the excessive amount of data to disk, which is of course slow.

Fortunately changing work_mem is simple and can even be done at the session level.

Speeding up sorts in PostgreSQL – using more work_mem

Let us change work_mem for our current session and see what happens to our example shown before.

The easiest way to change work_mem on the fly is to use SET. In this case I have set the parameter to 1 GB. Now PostgreSQL has enough RAM to do stuff in memory:

The performance impact is incredible. The speed has improved from 6.6 seconds to around 2.7 seconds, which is around 60% less. As you can see, PostgreSQL uses “quicksort” instead of “external merge Disk”. If you want to speed up and tune sorting in PostgreSQL, there is no way of doing that without changing work_mem. The work_mem parameter is THE most important knob you have. The cool thing is that work_mem is not only used to speed up sorts – it will also have a positive impact on aggregations and so on.

Taking care of partial sorts

As of PostgreSQL 10 there are 3 types of sort algorithms in PostgreSQL:

  • external sort Disk
  • quicksort
  • top-N heapsort

“top-N heapsort” is used if you only want a couple of sorted rows. For example: The highest 10 values, the lowest 10 values and so on. “top-N heapsort” is pretty efficient and returns the desired data in almost no time:

Wow, the query returns in less than one second.

Improving sorting: Consider indexing …

work_mem is ideal to speed up sorts. However, in many cases it can make sense to avoid sorting in the first place. Indexes are a good way to provide the database engine with “sorted input”. In fact: A btree is somewhat similar to a sorted list.

Building indexes (btrees) will also require some sorting. Many years ago PostgreSQL used work_mem to tell the CREATE INDEX command, how much memory to use for index creation. This is not the case anymore: In modern versions of PostgreSQL the maintenance_work_mem parameter will tell DDLs how much memory to use.

Here is an example:

The default setting for maintenance_work_mem is 64 MB, but this can of course be changed:

The index creation will be considerably faster with more memory:

In this case CREATE INDEX can use up to 1 GB of RAM to sort the data, which is of course a lot faster than going to disk. This is especially useful if you want to create large indexes.

The query will be a lot faster if you have proper indexes in place. Here is an example:

In my example, the query needs far less than a millisecond. If your database happens to sort a lot of data all the time, consider using better indexes to speed things up, rather than pumping work_mem up higher and higher.

Sort performance in PostgreSQL and tablespaces

Many people out there are using tablespaces to scale I/O. By default PostgreSQL only uses a single tablespace, which can easily turn into a bottleneck. Tablespaces are a good way to provide PostgreSQL with more hardware.

Let us assume you have to sort a lot of data repeatedly: The temp_tablespaces is a parameter, which allows administrators to control the location of temporary files sent to disk. Using a separate tablespace for temporary files can also help to speed up sorting.

If you are not sure how to configure work_mem, consider checking out http://pgconfigurator.cybertec.at - it is an easy tool helping people to configure PostgreSQL.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

2 responses to “Sort Performance in PostgreSQL”

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram