CYBERTEC PostgreSQL Logo

cursor_tuple_fraction and the PostgreSQL JDBC driver

05.2024 / Category: / Tags: |
cursor_tuple_fraction
© Laurenz Albe 2024

This article is a kind of failure story. I originally set out to write a piece about the PostgreSQL parameter cursor_tuple_fraction, only to discover that my boss has already covered that topic to some extent. Then I thought that I could investigate the parameter’s effect on the JDBC driver. That led me to an embarrassingly long wrestle with auto_explain and an eventual failure to find anything. But the world is littered with success stories, so why not share some failure? And there is the one or other interesting insight in store that might make the article worth your while.

About cursor_tuple_fraction

In a database, cursors are the canonical way of iterating through result sets. Cursors avoid the need to read the complete result set at once, which reduces the memory consumption. In PostgreSQL you can use the non-standard statement DECLARE to explicitly create a cursor, but usually you use client API functions or PL/pgSQL FOR loops that create PostgreSQL cursors “behind the scene”.

Transferring result rows to the client takes some time, and processing on the client side takes even more time. If an application processes a query result in chunks using a cursor, it can take a while until it reaches the end of the result set. Therefore, it is most important to get the first result rows as quickly as possible when you read the result set using a cursor. The total execution time is less important. This is where the PostgreSQL parameter cursor_tuple_fraction comes into play: for queries executed using a cursor, it tells the optimizer to prefer execution plans that produce the first result rows quickly, at the expense of the total query execution time. Hans’ article shows you a good example of what cursor_tuple_fraction does.

Another use case for cursor_tuple_fraction is if you use a cursor, but want to optimize the total execution time of the query. This is usually the case if you declare a cursor WITH HOLD, so that it can outlive a database transaction. The result set of a WITH HOLD cursor is materialized when the transaction commits, so that COMMIT has to wait until the complete result set has been calculated. To keep that time as short as possible, it is a good idea to change cursor_tuple_fraction to 1.0, so that PostgreSQL plans the query for the shortest total execution time.

An example to demonstrate cursor_tuple_fraction

For what follows, I need a little example that exhibits cursor_tuple_fraction. The following table will suffice:

I am using an UNLOGGED table to save myself the overhead of writing WAL. After all, we are only interested in query performance.

During normal execution, PostgreSQL uses a bitmap index scan and a sort, which is the fastest way to compute the whole result:

When the query is used in a cursor, PostgreSQL chooses an index scan, which can quickly find the first result rows in the correct order:

Testing cursor execution with the PostgreSQL JDBC driver

By default, the PostgreSQL JDBC driver reads the complete result set at once. This can cause problems with big result sets. To mitigate the problem, you can use java.sql.PreparedStatement.setFetchSize(int) to make the driver fetch the result set in chunks. In addition, the driver uses server-side prepared statements if a java.sql.PreparedStatement is executed repeatedly. By default that happens from the fifth execution on, but you can change this prepare threshold.

To examine the JDBC driver’s behavior, I wrote this little Java program:

Trying to get auto_explain to work

To examine the execution plan that PostgreSQL is using when running the query from the JDBC driver, the go-to tool is auto_explain. It allows you to capture the execution plan of all statements whose execution times exceeds a certain threshold. I enabled auto_explain with the following parameters in postgresql.conf:

I restarted PostgreSQL to activate the changed shared_preload_libraries and ran my sample program. Then I looked into the PostgreSQL log for the execution plan and found — nothing. Other statements (for example, statements that I executed by hand via psql) had their execution plans logged. Only my Java program failed to log an execution plan. It cannot be the fault of the client, because auto_explain operates on the database server, and PostgreSQL definitely executed the statement.

It took me a while to figure out what was going on. If you know PostgreSQL well, you might want to check if you can figure out what the problem was before reading on to the next section.

Fixing the Java program so that it logs execution plans

A PostgreSQL cursor is implemented by a data structure called a portal, which holds the current execution state. The trigger for auto_explain to log an execution plan is the duration of the statement. Consequently, it has to wait until the statement is done — before that, PostgreSQL doesn’t know the execution time. In the case of a portal, execution is done when the portal is closed. I had assumed that the portal gets closed when I close the result set (“rs.close()” in my sample program), but that doesn’t send a close message to the server. Instead, the portal gets closed when the program commits the transaction. Now I had disabled autocommit so that I could use a cursor, but I had forgotten to explicitly commit the transaction, because I only read from the database and my job was done anyway.

In order to get auto_explain to work as it should, I had to add one line:

After that, I was ready for my next disappointment.

cursor_tuple_fraction has no effect when using the JDBC driver

To my surprise, the plan used by the JDBC driver was:

Why does PostgreSQL not use the “fast start-up” plan? If you read my article about parallel query and the JDBC driver, you may suspect why. The JDBC driver does not use the DECLARE statement to create a cursor, but uses another technique provided by the PostgreSQL Frontend/Backend protocol:

The overall execution cycle consists of a parse step, which creates a prepared statement from a textual query string; a bind step, which creates a portal given a prepared statement and values for any needed parameters; and an execute step that runs a portal’s query. In the case of a query that returns rows (SELECT, SHOW, etc.), the execute step can be told to fetch only a limited number of rows, so that multiple execute steps might be needed to complete the operation.

The JDBC driver implements cursors by repeatedly sending the “execute” message for the same portal with a row count limit. That is a perfectly valid technique, but the PostgreSQL server doesn’t know at query planning time that the result will be fetched in chunks. Consequently, it cannot know that it should plan the query using cursor_tuple_fraction. In other words, the JDBC driver can never benefit from fast start-up plans, unless you use an explicit LIMIT clause.

Conclusion

I ended up spending too much time with no big discovery to share. Still, I learned a couple of things:

  • auto_explain emity a log message only when the portal is closed
  • the PostgreSQL JDBC driver closes the portal only when the transaction commits
  • the PostgreSQL JDBC driver can never benefit from cursor_tuple_fraction

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram