On operational issues side, one thing that quite commonly floats atop when dealing with customers using Postgres, especially with smaller setups, is Streaming Replication and it’s failures. Failure here not as a bug or design failure, but more as a misunderstood “feature”, as encountered problems are mostly actually things that work as intended and with good reasons, but people don’t know about the reasoning and at some point find themselves stressed when seeing errors. The good side here though is that it shows how easy it actually is nowadays to set up Postgres Streaming Replication (SR) even without any deeper background knowledge. So here’s a quick recap on the most important thing people should consider when running SR, so not to get caught off guard.
Table of Contents
The below error must be the most common problem for all SR users. It shows up on the replicas on cases where 1) the network connection with the master went away or got too slow (quite common for a disaster recovery instance on other side of the world), 2) the replica had a downtime (Postgres shutdown or hardware/server maintenance), so that the master managed to write more data than the configured maximum WAL (Write-Ahead-Log) size. And until version 9.5 it was by default only 48 MB! And from 9.5+ in worst case scenario minimally only 80MB by default.
1 |
ERROR: requested WAL segment 00000001000000000000000A has already been removed. |
The solution? Firstly there’s no other way around it on the replica side than rebuilding again from the master (meaning mostly pg_basebackup). For bigger DBs this can take hours of time and could also affect master performance, so not good...
Long term solution would be:
This guarantees that extra WAL files would be kept around on the master. By default it’s 0, meaning no extra disk space is reserved. Simplest approach here for not so busy databases with no burst-writing would be to set it to value corresponding to a couple of days of data volume. This should give enough time to fix the network/server – given of course according failure detection systems are in place.
Determining the daily data volume could be problematic though here without some continuous monitoring tool or script (using pg_current_xlog_location+pg_xlog_location_diff) typically, but when having constant workflows one can estimate it pretty good based on the “change“ timestamps from DATADIR/pg_xlog folder. NB! Not to be confused with the standard “modified” timestamps that you see from ‘ls -l’. When you for example see that your ‘find pg_xlog/ -cmin -60’ (file attributes changed within last hour) yields 3, you’ll know that you’re writing ca 1.2GB (3*16*24) per day and can set wal_keep_segments accordingly.
Replication slots (9.4+) are a feature designed specifically for this problem scenario and they guarantee storing of WAL files on master when replicas disconnect, per subscriber, meaning Postgres will delete WALs only when all replicas have received them. This complicates matters a bit of course. Steps to take:
* set max_replication_slots to the projected count of replicas (plus safety margin) on the master and restart
* create a named slot (speaking of only physical replication here) using pg_create_physical_replication_slot on the master
* modify the “recovery.conf” file on the replica to include the line ‘primary_slot_name=slotX’ and restart
* dropping the slot on master with pg_drop_replication_slot when decommissioning the replica
NB! When using replication slots it is absolutely essential that you have some kind of monitoring in place as when a replica goes away and it’s not picked up, eventually the master will run out of disk space on the XLOG partition. And when in danger of running out of disk space find out the slot that’s lagging behind the most with the below query and drop it. This means though also rebuilding the replica usually.
1 2 |
select slot_name, pg_xlog_location_diff(pg_current_xlog_location(), restart_lsn) as lag_b from pg_replication_slots order by 2 desc; |
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
If you have archive_mode and archive_command configured, the replica can use recovery_command to get the missing WAL and then reconnect to the master and resume streaming. You can totally avoid rebuilding replicas by using the archived WAL. And you're probably already archiving WAL for PITR purposes as part of your backup, right? 🙂
Yes, a valid note. Described "basic" setup is of course not really recommended normally. But that doesn't mean you don't see such setups in the wild 🙂
Why is this not recommended? This is what is described in the documentation and all of the howtos that I read. So this is what we implemented. It works well for us because we need to keep WAL around anyway for normal backup and recovery.
Yes, it works and is correct, no problems there but additionally PITR would still be a very nice thing to have around 🙂
you meant "pg_create_physical_replication_slot", and not "pg_create_logical_replication_slot" as 2nd step, right?
Best Regards, Andreas 'akretschmer' Kretschmer
Arggh, sure, corrected 🙂 Thanks!
thank you Kaarel Moppel. I am new to this topic.
Is there any performance impact on database with high number of replication slots configured with database ?