© Laurenz Albe 2024
Table of Contents
During training sessions, I tell my students how they can tune transactional workloads by disabling the parameter synchronous_commit
. It is easy to demonstrate the striking effects of that measure, but the possibility to lose committed transactions during an operating system crash makes it a non-starter for many applications. So I tell the students about commit_delay
and commit_siblings
, but when they ask me for a demonstration, I have to tell them that that would be too complicated. So I decided to try it out and write about it.
To make sure that committed transactions cannot get lost, PostgreSQL has to make sure that the WAL for the transaction is flushed to disk before it can report success to the client. If the database workload is dominated by small data modifications, the IOPS generated by these transactions can saturate the disk, even though the amount of data written is moderate.
The parameter pair commit_delay
and commit_siblings
can relax the bottleneck by reducing the number of IOPS necessary for those WAL flushes.
commit_delay
and commit_siblings
work?You activate the feature by setting commit_delay
to a value greater than zero. Whenever a transaction reaches the point where it would flush the WAL to disk during a commit, it first examines how many other transactions are currently active. If there are at least commit_siblings
other transactions open and not waiting for a lock, PostgreSQL doesn't flush the WAL immediately, but waits for commit_delay
microseconds. After that delay, some other transactions may have reached the point when they are ready to flush the WAL. All these backends can then perform their WAL flush in a single I/O operation.
commit_delay
is not easy to tune, because the delay will make the transaction take longer. On the other hand, if you choose a value that is too low, no other transaction might be ready by the time the delay has passed, and you cannot reduce the number of IOPS performed.
commit_delay
benchmarkThe benchmark was run on my ASUS ZenBook UX433F notebook with local NVME disk, 8 CPU cores and 16GB RAM. I set shared_buffers = 3GB
, max_wal_size = 100GB
and checkpoint_timeout = 15min
. Then I initialized the standard pgbench
database with a scale factor of 100. I used pg_prewarm to load all the pgbench
tables and indexes into shared buffers. That way, there should be no reading I/O ever, and, apart from checkpoints, the only I/O would be WAL writes.
The pgbench
command I used was
1 |
pgbench -b simple-update -c 10 -T 1200 |
The built-in NVME in my laptop is so powerful that I couldn't saturate it with pgbench
. Therefore, I decided to use Linux control groups to throttle the device to 1000 IOPS. On my Fedora 40 system, I had to enable I/O control for the systemd slices:
1 |
echo '+memory +pids +io' > /sys/fs/cgroup/system.slice/cgroup.subtree_control |
Then, I could set the IOPS limit on the NVME for writing for the PostgreSQL v17 service:
1 |
echo '259:0 wiops=1000' > /sys/fs/cgroup/system.slice/postgresql-17.service/io.max |
You could argue that that makes my test artificial. However, people who host their databases in a public cloud are constrained by limits just like this one. And then, you can never directly apply the results of a benchmark to a different system and workload anyway.
commit_delay
benchmarkcommit_delay |
transactions per second | IOPS |
---|---|---|
0 μs | 1576 | 1000 |
10 μs | 1703 | 1000 |
30 μs | 1715 | 1000 |
50 μs | 1778 | 1000 |
100 μs | 1837 | 1000 |
200 μs | 1933 | 1000 |
500 μs | 2183 | 1000 |
750 μs | 2583 | 900 |
1000 μs | 2738 | 600 |
1250 μs | 2508 | 510 |
1500 μs | 2397 | 480 |
2000 μs | 2051 | 430 |
We achieved the best performance with a commit_delay
of 1000 μs. With that setting, pgbench
performed somewhat less than twice as many transactions per second than without commit_delay
. It is interesting to note that at the optimum, the disk is far from saturated, so it might be possible to achieve even better results.
While commit_delay
doesn't boost the performance of a transactional workload in the same way that synchronous_commit = off
can, we were still able to achieve a substantial performance improvement. If you cannot afford to lose transactions after an operating system crash, tuning commit_delay
is the best you can do to speed up a workload consisting of short transactions.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Hi, tkanks for this very interesting blog. You talk about the value of commit_delay, but what about commit_siblings, the default value of 5 is good, what's your point of view?
Regards,
I had enough to do to test all kinds of settings for
commit_delay
. Varyingcommit_siblings
as well would have been too much.So I don't know. My guess would be that the longer your transactions are, the higher the value would have to be. But then the feature really only makes sense with short transactions, so I find it hard to make a guess.