CYBERTEC PostgreSQL Logo

Keyset pagination with descending order

07.2024 / Category: / Tags: |
no-offset-banner
© Markus Winand 2014

Keyset pagination is the most performant way to retrieve a large result set page by page. However, the neat trick with composite type comparison doesn't always work. This article explains why and how you can work around that shortcoming.

An example table for paginated queries

We create a table with a million rows:

Normal keyset pagination

If we want to ORDER BY val1, val2, the first query would look like

The primary key id was added to the result list and the ORDER BY clause to provide uniqueness and thereby guarantee a stable order. Assuming that the last row returned from that query was (1, '2024-01-01 01:00:00+01', 'e3', 920198), the query for the next page would be

With a multi-column index on (val1, val2, id), both queries would be lightning fast, as would be the queries for all following pages.

The problem with keyset pagination in descending and mixed order

Matters become more complicated as soon as we need descending sort order. A fairly simple case is if we need to sort the result by a single colum (for example, val3). Again, we need the unique column id as a tie-breaker. The solution is to use descending sort order for both columns:

This would scan an index on (val3, id) in reverse order and be just as efficient as before.

However, if we want to ORDER BY val3, val1 DESC, the above trick will no longer work. Finding the first 50 results is no problem, but we hit an impasse when we want to fetch the second page. We'd have to compare the triple (val3, val1, id) with the last row from the first page, but neither comparison operator will do: for the first element, we'd need to compare with >, but for the second element we need <.

It seems that we cannot use keyset pagination in a case like this. But, as the following sections will show, there are ways to get keyset pagination to work with mixed sorting order.

Keyset pagination using the negative value

The simplest trick to get the above example working is to use -val1 to turn the descending sort order into an ascending search order:

The index you need to support this query is

The above trick is simple, but will only work for data types for which you can turn descending order into ascending order. Numbers are easy. To make the trick work with val2, which is a timestamp with time zone, we have to work harder.

“Negative value” for timestamps

There is no “unary minus” operator for timestamp with time zone, but we can extract the number of seconds since the Unix epoch:

This looks ugly, but is fairly straightforward. However, we'll get a surprise if we try to create the necessary index:

The reason why EXTRACT for timestamp with time zone isn't immutable is that for some arguments, the result depends on the current setting of timezone. However, extracting the Unix epoch is immutable, so we can safely work around the problem by defining an IMMUTABLE function for that purpose:

To make use of the index, we have to use the function in our queries as well.

Getting the negative value to turn the comparison operator from < to > is a good trick, but it won't work for all data types. For example, you cannot get a “negative value” for a string, so our trick won't work for the column val3.

Keyset pagination by splitting up the query

A second approach to solving the problem of descending sort order is to avoid composite value comparison at all. For example, if you want to ORDER BY val2, val3 DESC, you could write the query for the next page as

That query is technically correct, but the WHERE condition contains OR, which is notoriously bad for performance. The query won't use an index defined on the ORDER BY clause effectively. But we can split the query into three simpler subqueries:

This query looks ugly, but each of the three branches fetches 50 rows using an index on (val2, val3 DESC, id), so the query will be quite fast. Here is the execution plan:

Keyset pagination by defining a custom data type

Finally, you can solve the problem of the descending sorting order by defining a data type that is just like a normal data type, but the comparison operators are switched around. For that, you need to define a new base type. You don't need to write any C code, since we are just reusing the functions from the standard data types. That requires fairly deep knowledge of PostgreSQL and superuser access, so it is not something you can do in a hosted database. I doubt that many users will want to go that way, but it is a nice exercise of PostgreSQL's extensibility features.

For our experiment, we want to use the same ORDER BY clause as in the previous section, so we will define a data type invtext that sorts inversely to text.

Defining the data type

Since the type input and output functions reference the data type and vice versa, we have to define a shell type first.

So far, our type is just like text. The type casts in the end allow us to convert other string data types to invchar. I won't bother to define a cast from character, since that is a data type you shouldn't use anyway. The cast from invtext to text is only needed for the b-tree comparison function further down.

Defining comparison operators for the data type

This is where invtext differs from text. We are reusing the PostgreSQL core functions from the data type text, but we switch around the order of comparison:

Defining a b-tree operator class for the data type

Finally, we need a comparison function and an operator class so that we can use b-tree indexes on the new data type. Again, we reverse the direction of comparison.

Using the custom data type for keyset pagination

Using our new data type, keyset pagination with the order val2, val3 DESC becomes simple:

The index to support these queries is

Conclusion

We have seen three tricks to get keyset pagination to work efficiently if some columns are sorted in ascending order and others in descending order:

  • order by the negative of the descending column to make them ascending
  • use a union of subqueries with scalar comparisons that can use the index
  • define a custom data type with opposite sorting order and use it for the descending columns to make them ascending

The first trick is the simpleast, but works only with some data types. The second trick is a bit more complicated, but works everywhere. The third trick is cute, but perhaps too much black magic for practical use.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram