CYBERTEC PostgreSQL Logo

Subtransactions and performance in PostgreSQL

03.2020 / Category: / Tags: |
Subtransactions in PL/pgSQL
© Laurenz Albe 2020

 

(Last updated on 2023-01-24) Recently, while troubleshooting PostgreSQL performance problems, I ran into problems with subtransactions twice. So I thought this was a nice topic for another blog post.

What are subtransactions?

Everybody knows database transactions. In PostgreSQL, which is operating in autocommit mode, you have to start a transaction that spans multiple statements explicitly with BEGIN or START TRANSACTION and close it with END or COMMIT. If you abort the transaction with ROLLBACK (or end the database session without committing) all work done inside the transaction becomes undone.

Now subtransactions allow you to roll back part of the work done in a transaction. You start a subtransaction inside a transaction with the standard SQL statement:

name” is an identifier (no single quotes!) for the subtransaction. You cannot commit a subtransaction in SQL (it is automatically committed with the transaction that contains it), but you can roll it back with:

The use of subtransactions

Subtransactions are useful in longer transactions. In PostgreSQL, any error inside a transaction will abort the transaction:

With a transaction that does a lot of work this is quite annoying, because it means that all work done so far is lost. Subtransactions can help you recover from such a situation:

Note that ROLLBACK TO SAVEPOINT starts another subtransaction called a when it rolls back the old one.

Subtransactions in PL/pgSQL

Even if you have never used the SAVEPOINT statement, you may already have encountered subtransactions. Written in PL/pgSQL, the code in the previous section looks like this:

Every time you enter a block with an EXCEPTION clause, a new subtransaction is started. The subtransaction is committed when you leave the block, and rolled back when you enter the exception handler.

Compatibility with other databases

Many other databases handle errors inside a transaction differently. Rather than aborting the complete transaction, they roll back only the statement that caused the error, leaving the transaction itself active.

When migrating or porting from such a database to PostgreSQL, you might be tempted to wrap every single statement in a subtransaction to emulate the above behavior.

The PostgreSQL JDBC driver even has a connection parameterautosave” that you can set to “always” to automatically set a savepoint before each statement and rollback in case of failure.

As the following will show, this alluring technique will lead to serious performance problems.

A performance test case

To demonstrate the problems caused by overuse of subtransactions, I created a test table:

The table is small, unlogged and has a low fillfactor to reduce the required I/O as much as possible. This way, I can observe the effects of subtransactions better.

I'll use pgbench, the benchmarking tool shipped with PostgreSQL, to run the following custom SQL script:

The script will set 60 savepoints for test number 1, and 90 for test number 2. It uses prepared statements to minimize the overhead of query parsing.

pgbench will replace :client_id with a number unique to the database session. So as long as there are no more than 10 clients, each client's UPDATEs won't collide with those of other clients, but they will SELECT each other's rows.

Running the performance test

Since my machine has 8 cores, I'll run the tests with 6 concurrent clients for ten minutes.

To see meaningful information with “perf top”, you need the PostgreSQL debugging symbols installed. This is particularly recommended on production systems.

Test 1 (60 subtransactions)

This is what “perf top --no-children --call-graph=fp --dsos=/usr/pgsql-12/bin/postgres” shows while the test is running:

Test 2 (90 subtransactions)

This is what “perf top --no-children --call-graph=fp --dsos=/usr/pgsql-12/bin/postgres” has to say:

Even if we take into account that the transactions in test 2 are one longer, this is still a performance regression of 60% compared with test 1.

Implementation of subtransactions

To understand what is going on, we have to understand how transactions and subtransactions are implemented.

Whenever a transaction or subtransaction modifies data, it is assigned a transaction ID. PostgreSQL keeps track of these transaction IDs in the commit log, which is persisted in the pg_xact subdirectory of the data directory.

However, there are some differences between transactions and subtransactions:

  • Each subtransaction has a containing transaction or subtransaction (the “parent”)
  • Committing a subtransaction does not require a WAL flush
  • There can be only one transaction per database session, but many subtransactions

The information which (sub)transaction is the parent of a given subtransaction is persisted in the pg_subtrans subdirectory of the data directory. Since this information becomes obsolete as soon as the containing transaction has ended, this data do not have to be preserved across a shutdown or crash.

Subtransactions and visibility

The visibility of a row version (“tuple”) in PostgreSQL is determined by the xmin and xmax system columns, which contain the transaction ID of the creating and destroying transactions. If the transaction ID stored is that of a subtransaction, PostgreSQL also has to consult the state of the containing (sub)transaction to determine if the transaction ID is valid or not.

To determine which tuples a statement can see, PostgreSQL takes a snapshot of the database at the beginning of the statement (or the transaction). Such a snapshot consists of:

  • a maximum transaction ID: everything from that transaction on is invisible
  • the list of transactions and subtransactions that were active when the snapshot was taken
  • the command number of the earliest visible command in the current (sub)transaction

A snapshot is initialized by looking at the process array, which is stored in shared memory and contains information about all currently running backends. This, of course, contains the current transaction ID of the backend and has room for at most 64 non-aborted subtransactions per session. If there are more than 64 such subtransactions, the snapshot is marked as suboverflowed.

Explaining the test results

A suboverflowed snapshot does not contain all data required to determine visibility, so PostgreSQL will occasionally have to resort to pg_subtrans. These pages are cached in shared buffers, but you can see the overhead of looking them up in the high rank of SimpleLruReadPage_ReadOnly in the perf output. Other transactions have to update pg_subtrans to register subtransactions, and you can see in the perf output how they vie for lightweight locks with the readers.

Diagnosing the problem of too many subtransactions

Apart from looking at “perf top”, there are other symptoms that point at the direction of this problem:

  • Your workload performs well when run single-threaded, but badly when run in several concurrent database sessions
  • You often see the wait event “SubtransSLRU” (“SubtransControlLock” in older releases) in pg_stat_activity
  • From PostgreSQL v13 on, you can look at the monitoring view pg_stat_slru and check if blks_read in the row with name = 'Subtrans' keeps growing. That indicates that PostgreSQL has to read disk pages because it needs to access subtransactions that are no longer cached.
  • If you export a snapshot using the “pg_export_snapshot()” function, the resulting file in the pg_snapshots subdirectory of the data directory will contain the line “sof:1” to indicate that the subtransaction array overflowed
  • From PostgreSQL v16 on, you can call the function pg_stat_get_backend_subxact(integer) to return the number of subtransactions for a backend process and whether the subtransaction cache overflowed or not

Conclusion

Subtransactions are a great tool, but you should use them wisely. If you need concurrency, don't start more than 64 subtransactions per transaction.

The diagnostics shown in this article should help you determine whether you are suffering from the problem, or not.

Finding the cause of that problem can be tricky. For example, it may not be obvious that a function with an exception handler that you call for each result row of an SQL statement (perhaps in a trigger) starts a new subtransaction.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

3 responses to “Subtransactions and performance in PostgreSQL”

  1. Hi,
    Does the wait event "SubtransControlLock" mentioned above corresponds to "SubtransSLRU" in PGSQL13 ?

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram