By Kaarel Moppel - If you're not yet familiar with the Postgres "synchronous_commit" parameter, you should definitely keep reading. It's one of the important parameters and contains an above-average number of options, which could be a bit daunting for the casual DBA. And in the soon to be released PostgreSQL 9.6, another new option called "remote_apply" was added. So, I thought I'll take it for a spin out of curiosity, while also trying to explain the other options in simple terms. I also wanted to perform some testing in a streaming replication scenario, with and without synchronous replication.
Table of Contents
Allowed values up to Postgres version 9.5 were - "on", "remote_write", "local" and "off" and now with 9.6 "remote_apply" will join the party. Short descriptions of all of these in plain-text could be something like that of below. NB! Latter 3 values are effective only in synchronous streaming replication mode and fall back to "on" value when no replicas have been listed in the "synchronous_standby_names" parameter.
* on - Transaction commit always waits until the data is really flushed to the transaction log (aka WAL or XLOG) making sure the transaction is really persisted. In synchronous streaming replication mode also the replica needs to do the same.
* off - Commit can be acknowledged to the calling client before the transaction is actually flushed to the transaction log, making it possible to lose some recent (<1s by default) allegedly-committed transactions in case of a server crash. Synchronous replication is ignored. This is one of the most known Postgres performance tweaks. * local - Forces the "on" behavior (guaranteed data flush) only on the primary node. Usually used as user set session parameter for speeding up non-critical data imports on primary for example.
* remote_write - Provides a weaker guarantee than "on", transaction counts as commited if primary does a guaranteed flush and the replica just gets a write confirmation from the operating system (prone to replica corruption if replica machine crashes before hitting storage).
* remote_apply - Provides the strongest replica consistency - commit doesn't return before replica flushes and also applies the data. Clients could either query the primary or the standby, they would have exactly the same view on the data.
Now what is the performance impact of these different values in a streaming replication scenario? Logically we could assume that we could just order them based on given guarantees (which should translate to amount of work done) like that:
off (async) > on (async) > remote_write (sync) > on|local (sync) > remote_apply (sync)
But what would the penalty in numbers be when using higher consistency levels? How much slower would the transactions get? Let's do a set of quick tests again with our good old buddy pgbench!
For testing, I created a primary-replica setup on AWS and a script running through different parameter values, re-starting and re-initializing the test schema for every parameter. The script can be found here so I will skip details for brevity and just present you the TPS (transactions per second) results.
On a pair of AWS EC2 i2.xlarge instances (good starting point for a busy application as its I/O optimized, 4 vCPU, 30.5 GB RAM, 800 GB SSD) I got the following numbers for a shortish 10 minute test period on a "pgbench" scale 100 size dataset (dataset fits in RAM) with 8 concurrent clients:
“on” async - 4256 TPS (FYI – here 1 transaction means 3 updates, 1 insert, 1 select)
“off” async - 6211 TPS (+45% compared to default "async on")
“on” sync - 3329 TPS (-22% compared to default "async on")
“remote_write” sync - 3720 TPS (+12% compared to "sync on")
“remote_apply” sync - 3055 TPS (-8% compared to "sync on")
First, as always – performance numbers rely on a lot of things, with synchronous replication especially a lot on network performance/distance, so your mileage will definitely vary, but I think we can still note a few things from here:
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Hello Kaarel,
I have set up pg_dump on standby server. I have streaming replication configured betwen primary and standby. Sometimes backup fails with below error.
ERROR: canceling statement due to conflict with recovery
Detail: User was holding a relation lock for too long.
Statement: COPY public.tablename (id, queuetime, reexecuteafter, queuedata, jsonbody, iteration, isexecuted, lastexecuted) TO stdout;
Is synchronous_commit=on the root cause for this ? I checked same time no other process running on primary server. At 1:00 am vacuum analyze and reindex processes run which takes around 30 minutes.
backup using pg_dump starts at 2:00 am.
Any hint ?