By Kaarel Moppel - The "backend_flush_after" PostgreSQL server configuration parameter was introduced some time ago, in version 9.6. It has been flying under the radar, and had not caught my attention previously. However, I recently was pasted (not being on Twitter) a tweet from one of the Postgres core developers Andres Freund. The tweet basically said: – if your workload is bigger than shared_buffers
, you should enable the backend_flush_after
parameter for improved throughput and also jitter. Hmm, who wouldn’t like an extra boost on performance for free? FOMO kicked in... but before adding this parameter to my “standard setup toolbox”, I hurried to test things out – my own eye is king! So here's a quick test and my conclusion on the effects of enabling (not enabled by default!) backend_flush_after
.
Table of Contents
Trying to interpret the documentation in my own words - backend_flush_after
is basically designed to enable sending “hints” to the OS, that if the user has written more than X bytes (configurable from 0 to max. 2MB), it would be very nice if the kernel could already flush recently changed data files in the background. That way, when the “checkpointer” comes or the kernel’s “dirty” limit is reached, there would be less bulk “fsyncing” to do – meaning less IO contention (spikes) for our user sessions. Thus resulting in smoother response times.
Be warned though - unlike most Postgres settings this one is not guaranteed to function. It currently only works on Linux systems which have sync_file_range()
functionality available – which again depends on the kernel version and file system used. In short, this explains why the parameter has not gotten too much attention. Similar story actually also with the “sister” parameters - bgwriter_flush_after
, checkpoint_flush_after
, wal_writer_flush_after
...with the difference that they are already enabled by default!
NB! Also note that this parameter, being controlled and initiated by Postgres, might be the only way to influence the kernel IO subsystem when using some managed / cloud PostgreSQL service!
As you might have noticed - although the tweet mentioned workloads bigger than shared_buffers
, in the spirit of good old “doubt everything”, I still decided to test both cases 🙂
With Test 1, where the workload fit into shared_buffers
, there’s actually nothing worthwhile to mention - my radars picked up no remotely significant difference, Andres was right! And basically test #2 also confirmed what was declared – but see the table below for numbers. NB! Numbers were measured on the server side from pg_stat_statements
. During the tests, system CPU utilization was on average around ~55% and IO-wait (vmstat “wa” column) was about 25%, which is much more than a typical system would exhibit. However, that highlights the backend_flush_after
IO optimizations better. Also note that the results table only includes numbers for the main UPDATE
pgbench_accounts
SQL statement. Differences for the other mini-tables (which get fully cached) were on the "noise level".
1 |
UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2 |
Test | Mean time (ms) | Change (%) | Stddev time (ms) | Change (%) |
---|---|---|---|---|
Workload=4x mem., backend_flush_after=0 (default) | 0.637 | - | 0.758 | - |
Workload=4x mem., backend_flush_after=512kB | 0.632 | -0.8 | 0.606 | -20.0 |
Workload=4x mem., backend_flush_after=2MB | 0.609 | -4.4 | 0.552 | -27.2 |
First off – as it was a very simple test, I wouldn’t assign too much importance to the numbers themselves. But it showed that indeed, the backend_flush_after
setting makes things a bit better when using the biggest "chunk size". This is especially visible with transaction time standard deviations...and more importantly – it doesn’t make things worse! For heavily loaded setups, I’ll use it without fear in the future, especially with spinning disks (if anyone still uses them), where the difference should be even more pronounced. Bear in mind though, that when it comes to the Linux kernel disk subsystem, there’s a bunch of other parameters that are relevant, like dirty_ratio
, dirty_background_ratio
, “swappiness” and the type of scheduler: the effects of tuning those could be even more pronounced!
If you have any questions, feel free to contact us.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
So why is this parameter not just set to eg 2Mb by default?
A good question indeed...as other *_flush_after params are already enabled by default. I can only guess that it has the most impact of those and they don't want to take any chances as there are a gazillion of different kernel / disk system combinations out there.