Why do I have a slow COMMIT in PostgreSQL?

05.2024 | Category: How To | Tags: development, performance tuning, under the hood

05.2024

Category: How To

Tags: development, performance tuning, under the hood

slow commit woes: one burglar tries to crack a safe, while the other tells him to hurry up (since they are in a critical section) — © Laurenz Albe 2024

Table of Contents

Sometimes one of our customers looks at the most time consuming statements in a database (either with pg_stat_statements or with pgBadger) and finds COMMIT in the high ranks. Normally, COMMIT is a very fast statement in PostgreSQL, so that is worth investigating. In this article, I will explore the possible reasons for a slow COMMIT and discuss what you can do about it.

The basic `COMMIT` activity in PostgreSQL

A slow COMMIT is a surprising observation, because committing a transaction is a very simple activity in PostgreSQL. In most cases, all a COMMIT has to do is

set the two bits for the transaction in the commit log to TRANSACTION_STATUS_COMMITTED (0b01) (persisted in pg_xact)
if track_commit_timestamp is set to on, record the commit timestamp (persisted in pg_commit_ts)
flush the write-ahead log (WAL) (persisted in pg_wal) to disk, unless synchronous_commit is set to off

Note that because of the multi-versioning architecture of PostgreSQL, both COMMIT and ROLLBACK are normally very fast operations: they both don't have to touch the tables, they only register the status of the transaction in the commit log.

The most frequent reason for slow `COMMIT`: disk problems

From the above it is clear that a potential cause of slowness is disk I/O. After all, flushing the WAL to disk causes I/O requests. So the first thing you should check is if the disk has a problem or is under load:

On Linux, you can use commands like “vmstat 1” or “sar -p 1” to measure the percentage of CPU time spent waiting for I/O (“wa” in vmstat and “%iowait” in sar). If that value is consistently above 10, you can be pretty certain that the I/O system is under stress.
With NAS, you should check if the TCP network is overloaded.
If the storage is a shared SAN or NAS, the disks may be shared wth other machines, and you should check if there is contention on the storage system.
Failing disks, other hardware problems or operating system problems can lead to intermittent performance problems. Check the kernel log for messages.

If we can rule out disk problems as the cause of slow commits, we have to investigate some more.

Reading the source to find out more

If we want to find out what happens during the commit of a transaction, the easiest approach is to read the source. Many people are not aware of the true power of open source: rather than having to guess or buy support from the software vendor, you can just see for yourself. The PostgreSQL source is well written and amply documented, and many parts don't need expert C skills to understand. The relevant code is in the function CommitTransaction() in src/backend/access/transam/xact.c.

There are several causes for a slow COMMIT in exceptional cases, like contention from many concurrent NOTIFY statements (a case you could diagnose from locks on database 0 in pg_locks). However, usually the culprit is one of the three cases described in the following sections.

Slow `COMMIT` caused by deferred constraints and triggers

Normally, PostgreSQL checks a constraint as part of the statement that modifies the constrained table. With deferred constraints, PostgreSQL waits with the check until the end of the transaction. One use case for that is if you insert data into tables with circular foreign key constraints:

CREATE TABLE department (
   department_id bigint PRIMARY KEY,
   name text NOT NULL,
   manager bigint NOT NULL
);

CREATE TABLE employee (
   employee_id bigint PRIMARY KEY,
   name text NOT NULL,
   department_id bigint
      REFERENCES department NOT NULL
);

-- deferred foreign key
ALTER TABLE department
   ADD FOREIGN KEY (manager) REFERENCES employee
      DEFERRABLE INITIALLY DEFERRED;

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

CREATE TABLE department (

department_id bigint PRIMARY KEY,

name text NOT NULL,

manager bigint NOT NULL

);

CREATE TABLE employee (

employee_id bigint PRIMARY KEY,

name text NOT NULL,

department_id bigint

REFERENCES department NOT NULL

);

-- deferred foreign key

ALTER TABLE department

ADD FOREIGN KEY (manager) REFERENCES employee

DEFERRABLE INITIALLY DEFERRED;

The deferred foreign key makes it easy to create a new department:

START TRANSACTION;

-- won't raise a foreign key violation yet
INSERT INTO department
   (department_id, name, manager)
VALUES (12, 'Flower Picking', 123);

INSERT INTO employee
   (employee_id, name, department_id)
VALUES (123, 'John Wurzelrupfer', 12);

-- deferred constraint is valid now
COMMIT;

1

2

3

4

5

6

7

8

9

10

11

12

13

START TRANSACTION;

-- won't raise a foreign key violation yet

INSERT INTO department

(department_id, name, manager)

VALUES (12, 'Flower Picking', 123);

INSERT INTO employee

(employee_id, name, department_id)

VALUES (123, 'John Wurzelrupfer', 12);

-- deferred constraint is valid now

COMMIT;

Since PostgreSQL checks deferred constraints at commit time, they can slow down COMMIT processing. Ordinarily, checking a constraint is very fast — it is an index lookup. However, many such checks can accumulate in a larger transaction, and the collective execution time can slow down COMMIT considerably.

PostgreSQL also has constraint triggers that are deferrable and can slow down COMMIT just like deferred constraints. See this article for a further discussion of use cases for constraint triggers.

If you can identify deferred constraints or triggers as the cause of slow COMMIT, that may be just fine and no reason to worry. After all, you have to check those constraints at some point.

Slow `COMMIT` caused by cursors `WITH HOLD`

A cursor allows the client to fetch a query result set in chunks, which can simplify processing and avoids going out of memory on the client side. However, a regular cursor can only exist in the context of a database transaction. Consequently, you cannot use cursors when a user interaction is involved: the snapshot held by the cursor would stall the progress of VACUUM and lead to table bloat and worse problems. Moreover, ACCESS SHARE locks held for the duration of the transaction will cause problems with concurrent ALTER TABLE or TRUNCATE.

To avoid the limitation of being tied to a transaction, you can use a cursor WITH HOLD. Such a cursor can outlive the transaction that created it and is for example useful for implementing pagination. PostgreSQL implements WITH HOLD cursors by materializing the result set at commit time. If the query behind the cursor is expensive, that can make COMMIT painfully slow. Also, you must not forget to close such cursors when you are done, otherwise the materialized result set will hog server resources until the end of the database session.

If you can establish cursors WITH HOLD as the cause of your slow commits, you can improve the situation by tuning the queries from the cursor definition to run faster. Since PostgreSQL doesn't optimize queries in a cursor for speed of calculating the complete result set, it can sometimes help to set cursor_tuple_fraction to 1.0 to speed up commit processing.

Slow `COMMIT` caused by synchronous replication

Both streaming replication and logical replication are asynchronous by default. If you use replication for high availability, and you don't want to risk losing an already committed transaction, you can use synchronous replication. With synchronous replication, the order of operations at commit time is:

write the COMMIT record to WAL and flush it (this is the actual persistent commit)
wait until the synchronous standby reports that it has got all the WAL information
make the transaction visible on the primary server
report success to the client

If the network latency between primary and synchronous standby is high, COMMIT will take a long time. You can diagnose that by checking pg_stat_activity for frequent or long lasting “SyncRep” wait events. As a rule, you only want to use synchronous replication between machines with low network latency, that is, machines that are physically close together.

Slow `COMMIT` caused by third-party extensions

One of PostgreSQL's greatest features is its extensibility. It allows you to write extensions that interact with the guts of PostgreSQL without having to modify the server code. Third-party code can “hook” into PostgreSQL to modify its behavior. Among many other options, you can use the C function RegisterXactCallback() to register a callback that PostgreSQL executes at commit time. So if you look for the cause of a slow COMMIT, you should also have a look at the extensions installed in the database. For example, a foreign data wrapper that implements transaction processing at the remote data source may want to commit the remote transaction when PostgreSQL commits the local transaction. Then high network latency or slow transaction processing on the remote side will slow down the PostgreSQL COMMIT.

Conclusion

Looking at the source code, we can easily find the likely causes for slow COMMIT: apart from the obvious cause of disk problems, you should consider deferred constraints, cursors WITH HOLD and synchronous replication as likely causes.

Why do I have a slow COMMIT in PostgreSQL?

The basic `COMMIT` activity in PostgreSQL

The most frequent reason for slow `COMMIT`: disk problems

Reading the source to find out more

Slow `COMMIT` caused by deferred constraints and triggers

Slow `COMMIT` caused by cursors `WITH HOLD`

Slow `COMMIT` caused by synchronous replication

Slow `COMMIT` caused by third-party extensions

Conclusion

Leave a Reply Cancel reply

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Why do I have a slow COMMIT in PostgreSQL?

The basic COMMIT activity in PostgreSQL

The most frequent reason for slow COMMIT: disk problems

Reading the source to find out more

Slow COMMIT caused by deferred constraints and triggers

Slow COMMIT caused by cursors WITH HOLD

Slow COMMIT caused by synchronous replication

Slow COMMIT caused by third-party extensions

Conclusion

Leave a Reply Cancel reply

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

The basic `COMMIT` activity in PostgreSQL

The most frequent reason for slow `COMMIT`: disk problems

Slow `COMMIT` caused by deferred constraints and triggers

Slow `COMMIT` caused by cursors `WITH HOLD`

Slow `COMMIT` caused by synchronous replication

Slow `COMMIT` caused by third-party extensions