The mysterious “backend_flush_after” configuration setting

07.2019

By Kaarel Moppel - The "backend_flush_after" PostgreSQL server configuration parameter was introduced some time ago, in version 9.6. It has been flying under the radar, and had not caught my attention previously. However, I recently was pasted (not being on Twitter) a tweet from one of the Postgres core developers Andres Freund. The tweet basically said: – if your workload is bigger than shared_buffers, you should enable the backend_flush_after parameter for improved throughput and also jitter. Hmm, who wouldn’t like an extra boost on performance for free? FOMO kicked in... but before adding this parameter to my “standard setup toolbox”, I hurried to test things out – my own eye is king! So here's a quick test and my conclusion on the effects of enabling (not enabled by default!) backend_flush_after.

Table of Contents

What does this parameter actually do?

Trying to interpret the documentation in my own words - backend_flush_after is basically designed to enable sending “hints” to the OS, that if the user has written more than X bytes (configurable from 0 to max. 2MB), it would be very nice if the kernel could already flush recently changed data files in the background. That way, when the “checkpointer” comes or the kernel’s “dirty” limit is reached, there would be less bulk “fsyncing” to do – meaning less IO contention (spikes) for our user sessions. Thus resulting in smoother response times.

Be warned though - unlike most Postgres settings this one is not guaranteed to function. It currently only works on Linux systems which have sync_file_range() functionality available – which again depends on the kernel version and file system used. In short, this explains why the parameter has not gotten too much attention. Similar story actually also with the “sister” parameters - bgwriter_flush_after, checkpoint_flush_after, wal_writer_flush_after...with the difference that they are already enabled by default!

NB! Also note that this parameter, being controlled and initiated by Postgres, might be the only way to influence the kernel IO subsystem when using some managed / cloud PostgreSQL service!

Test setup for backend_flush_after

Hardware: 4vCPU, 8GB, 160 GB local SSD, Ubuntu 18.04 (dirty_ratio=20, dirty_background_ratio=10, no Swap) droplet on DigitalOcean (Frankfurt)
Software: PostgreSQL 11.4 at defaults, except - checkpoint_completion_target=0.9 (which is quite a typical setting to “smooth” IO), shared_buffers='2GB'
Test case: standard “pgbench” OLTP runs with 2 clients per CPU, 2h runs i.e.: “pgbench -T 7200 -c 8 -M prepared –random-seed=5432”
Test 1 settings: Workload fitting into Shared Buffers (--scale=100)
Test 2 settings: Workload 4x bigger than RAM (--scale=2200). FYI – to calculate the needed “scale factor” I use this calculator

As you might have noticed - although the tweet mentioned workloads bigger than shared_buffers, in the spirit of good old “doubt everything”, I still decided to test both cases 🙂

Test results for backend_flush_after

With Test 1, where the workload fit into shared_buffers, there’s actually nothing worthwhile to mention - my radars picked up no remotely significant difference, Andres was right! And basically test #2 also confirmed what was declared – but see the table below for numbers. NB! Numbers were measured on the server side from pg_stat_statements. During the tests, system CPU utilization was on average around ~55% and IO-wait (vmstat “wa” column) was about 25%, which is much more than a typical system would exhibit. However, that highlights the backend_flush_after IO optimizations better. Also note that the results table only includes numbers for the main UPDATE pgbench_accounts SQL statement. Differences for the other mini-tables (which get fully cached) were on the "noise level".

UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2

1	UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2

Test	Mean time (ms)	Change (%)	Stddev time (ms)	Change (%)
Workload=4x mem., backend_flush_after=0 (default)	0.637	-	0.758	-
Workload=4x mem., backend_flush_after=512kB	0.632	-0.8	0.606	-20.0
Workload=4x mem., backend_flush_after=2MB	0.609	-4.4	0.552	-27.2

Conclusion

First off – as it was a very simple test, I wouldn’t assign too much importance to the numbers themselves. But it showed that indeed, the backend_flush_after setting makes things a bit better when using the biggest "chunk size". This is especially visible with transaction time standard deviations...and more importantly – it doesn’t make things worse! For heavily loaded setups, I’ll use it without fear in the future, especially with spinning disks (if anyone still uses them), where the difference should be even more pronounced. Bear in mind though, that when it comes to the Linux kernel disk subsystem, there’s a bunch of other parameters that are relevant, like dirty_ratio, dirty_background_ratio, “swappiness” and the type of scheduler: the effects of tuning those could be even more pronounced!

If you have any questions, feel free to contact us.

2 responses to “The mysterious “backend_flush_after” configuration setting”

Colin 't Hart says:

August 13, 2019 at 1:12 pm

So why is this parameter not just set to eg 2Mb by default?

Reply
- Kaarel says:
  
  August 13, 2019 at 1:40 pm
  
  A good question indeed...as other *_flush_after params are already enabled by default. I can only guess that it has the most impact of those and they don't want to take any chances as there are a gazillion of different kernel / disk system combinations out there.
  
  Reply

The mysterious “backend_flush_after” configuration setting

What does this parameter actually do?

Test setup for backend_flush_after

Test results for backend_flush_after

Conclusion

2 responses to “The mysterious “backend_flush_after” configuration setting”

Leave a Reply Cancel reply

CYBERTEC Guest

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts