By Kaarel Moppel - Some weeks ago, in the light of PostgreSQL v12 release, I wrote a general overview on various major version upgrade methods and benefits of upgrading in general - so if upgrading is a new thing for you I’d recommend to read that posting first. But this time I’m concentrating on the newest (available since v10) and the most complex upgrade method - called “Logical Replication” or LR shortly. For demonstration purposes I’ll be migrating from v10 to freshly released v12 as this is probably the most likely scenario. But it should work the same also with v11 to v12. But do read on for details.
Table of Contents
First a bit of recap from the previous post on why would you use LR for upgrading at all. Well, in short - because it’s the safest option with shortest possible downtime! With that last point I’m already sold...but here again the list of “pros” / “cons”:
After the initial setup burden one just needs to wait (and verify) that the new instance hast all the data from the old one...and then just shut down the old instance and point applications to the new instance. Couldn’t be easier!
Also before the switchover one can make sure that statistics are up to date, to minimize the typical “degraded performance” period seen after "pg_upgrade" for more complex queries (on bigger databases). For high load application one could even be more careful here and pull the most popular relations into shared buffers by using the (relatively unknown) “pg_prewarm” Contrib extension or by just running common SELECT-s in a loop, to counter the “cold cache” effect.
One can for example already make some changes on the target DB – add columns / indexes, change datatypes, leave out some old archive tables, etc. The general idea is that LR does not work on the binary, 1-to-1 level as”pg_upgrade” does, but rather JSON-like data objects are sent over to another master / primary instance, providing quite some freedom on the details.
Before the final switchover you can anytime abort the process and re-try if something seems fishy. The old instances data is not changed in any way even after the final switchover! Meaning you can easily roll back (with cost of some data loss typically though) to the old version if some unforeseen issues arise. One should only watch out for the replication slot on the source / publisher DB if the target server just taken down suddenly.
As LR has some prerequisites on the configuration and schema, you’d first need to see if it’s possible to start with the migration process at all or some changes are needed on the old master node, also called the “publisher” in LR context.
Action points:
1) Enable LR on the old master aka subscriber aka source DB if not done already. This means setting “wal_level” to “logical” in postgresql.conf and making sure that “replication” connections are allowed in “pg_hba.conf” from the new host (also called the “subscriber” in LR context). FYI - changing “wal_level” needs server restart! To enable any kind of streaming replication some other params are needed but they are actually already set accordingly out of the box as of v10 so it shouldn’t be a problem.
2) Check that all tables have a Primary Key (which is good database design anyways) or alternatively have REPLICA IDENTITY set. Primary Keys don’t need much explaining probably but what is this REPLICA IDENTITY thing? A bit simplified - basically it allows to say which columns formulate uniqueness within a table and PK-s are automatically counted as such.
3) If there’s no PK for a particular table, you should create one, if possible. If you can’t do that, set unique constraints / indexes to serve as REPLICA IDENTITY, if at all possible. If even that isn’t possible, you can set the whole row as REPLICA IDENTITY, a.k.a. REPLICA IDENTITY FULL, meaning all columns serve as PK’s in an LR context - with the price of very slow updates / deletes on the subscriber (new DB) side, meaning the whole process could take days or not even catch up, ever! It’s OK not to define a PK for a table, as long as it’s a write-only logging table that only gets inserts.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
psql -c “ALTER SYSTEM SET wal_level TO logical;” sudo systemctl postgresql@10-main restart # find problematic tables (assuming we want to migrate everything 'as is') SELECT format('%I.%I', nspname, relname) AS tbl FROM pg_class AS c JOIN pg_namespace AS n ON c.relnamespace = n.oid WHERE c.relkind = 'r' AND NOT n.nspname LIKE ANY (ARRAY[E'pg\_%', 'information_schema']) AND NOT EXISTS (SELECT FROM pg_index AS i WHERE i.indrelid = c.oid AND i.indisunique AND i.indisvalid AND i.indisready AND i.indislive) ORDER BY 1; # set replica identities on tables highlighted by the previous query ALTER TABLE some_bigger_table REPLICA IDENTITY USING INDEX unique_idx ; ALTER TABLE some_table_with_no_updates_deletes REPLICA IDENTITY FULL ; |
Second most important step is to set up a new totally independent instance with a newer Postgres version - or at least create a new database on an existing instance with the latest major version. And as a side note - same version LR migrations are also possible, but you'd be solving some other problem in that case.
This step is actually very simple - just a standard install of PostgreSQL, no special steps needed! With the important addition that to make sure everything works exactly the same way as before for applications - same encoding and collation should be used!
1 2 3 4 5 |
-- on old SELECT pg_catalog.pg_encoding_to_char(d.encoding) AS 'Encoding', d.datcollate as 'Collate' FROM pg_database d WHERE datname = current_database(); -- on new CREATE DATABASE appdb TEMPLATE template0 ENCODING UTF8 LC_COLLATE 'en_US.UTF-8'; |
NB! Before the final switchover it’s important that no normal users have access to the new DB - as they might alter table data or structures and thereby inadvertently produce replication conflicts that mostly mean starting from scratch (or a costly investigation / fix) as “replay” is a sequential process.
Next we need to synchronize the old schema onto the new DB as Postgres does not take care of that automatically as of yet. The simplest way is to use the official PostgreSQL backup tool called “pg_dump”, but if you have your schema initialization scripts in Git or such and they're up to date then this is fine also. For syncing roles “pg_dumpall” can be used.
NB! After this point it’s not recommended to introduce any changes to the schema or be at least very careful when doing it, e.g. creating new tables / columns first on the subscriber and refreshing the subscriptions when introducing new tables - otherwise data synchronization will break! Tip - a good way to disable unwanted schema changes is to use DDL triggers! An approximate example on that is here. Adding new tables only on the new DB is no issue though but during an upgrade not a good idea anyways - my recommendation is to first upgrade and then to evolve the schema.
1 2 |
pg_dumpall -h $old_instance --globals-only | psql -h $new_instance pg_dump -h $old_instance --schema-only appdb | psql -h $new_instance appdb |
If preparations on the old DB has been finished (all tables having PK-s or replication identities) then this is a oneliner:
1 |
CREATE PUBLICATION upgrade FOR ALL TABLES; |
Here we added all (current and those added in future) tables to a publication (a replication set) named “upgrade” but technically we could also leave out some or choose to only replicate some operations like UPDATE-s, but for a pure version upgrade you want typically all.
NB! As of this moment the replication identities become important - and you might run into trouble on the old master if the identities are not in place on all tables that get changes! In such case you might see errors like that:
1 2 3 4 |
UPDATE pgbench_history SET delta = delta WHERE aid = 1; ERROR: cannot update table 'pgbench_history' because it does not have a replica identity and publishes updates HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE. |
Next step - create a “subscription” on the new DB. This is also a oneliner, that creates a logical replication slot on the old instance, pulls initial table snapshots and then starts to stream and apply all table changes as they happen on the source, resulting eventually in a mirrored dataset! Note that currently superuser rights are needed for creating the subscription and actually hit also makes life a lot easier on the publisher side.
1 2 3 |
CREATE SUBSCRIPTION upgrade_sub CONNECTION 'port=5432 user=postgres' PUBLICATION upgrade; NOTICE: created replication slot 'upgrade_sub' on publisher CREATE SUBSCRIPTION |
WARNING! As of this step the 2 DB-s are “coupled” via a replication slot, carrying some dangers if the process is aborted abruptly and the old DB is not “notified” of that. If this sounds new please see the details from documentation.
Depending on the amount of data it will take X minutes / days until everything is moved over and “live” synchronization is working.
Things to inspect for making sure there are no issues:
Although not a mandatory step, when it comes to data consistency / correctness, it always makes sense to go the extra mile and run some queries that validate that things (source - target) have the same data. For a running DB it’s of course a bit difficult as there’s always some replication lag but for “office hours” applications it should make a lot of sense. My sample script for comparing rowcounts (in a non-threaded way) is for example here but using some slightly more "costly" aggregation / hashing functions that really look at all the data would be even better there.
Also important to note if you’re using sequences (which you most probably are) - sequence state is not synchronized by LR and needs some manual work / scripting! The easiest option I think is that you leave the old DB ticking in read-only mode during switchover so that you can quickly access the last sequence values without touching the indexes for maximum ID-s on the subscriber side.
We’re almost there with our little undertaking...with the sweaty part remaining - the actual switchover to start using the new DB! Needed steps are simple though and somewhat similar to switching over to a standard, “streaming replication” replica.
Make sure it’s a nice shutdown. The last logline should state “database system is shut down”, meaning all recent changes were delivered to connected replication clients, including our new DB. Start of downtime! PS Another alternative to make sure absolutely all data is received is to actually configure the new instance in “synchronous replication” mode! This has the usual synchronous replication implications of course so I’d avoid it for bigger / busier DBs.
From v12 this is achieved by declaring a "standby.signal" file
if time constraints allow it - verify table sizes, row counts, your last transactions, etc. For “live” comparisons it makes sense to restart the old DB under a new, random port so that no-one else connects to it.
Given we'll leave the old DB in read-only mode the easiest way is something like that:
1 2 3 |
psql -h $old_instance -XAtqc 'SELECT $select setval('$ || quote_ident(schemaname)||$.$|| quote_ident(sequencename) || $', $ || last_value || $); $ AS sql FROM pg_sequences' appdb | psql -h $new_instance appdb |
your pg_hba.conf to allow access for all "mortal" users, then reconfigure your application, connection pooler, DNS or proxy to start using the new DB! If the two DB-s were on the same machine then it’s even easier - just change the ports and restart. End of downtime!
Basically we’re done here, but would be nice of course to clean up and remove the (no-more needed) subscription not to accumulate errors in server log.
1 |
DROP SUBSCRIPTION upgrade_sub; |
Note that if you won't keep the old “publisher” accessible in read-only or normal primary mode (dangerous!) though, some extra steps are needed here before dropping:
1 2 3 |
ALTER SUBSCRIPTION upgrade_sub DISABLE ; ALTER SUBSCRIPTION upgrade_sub SET (slot_name = NONE); DROP SUBSCRIPTION upgrade_sub; |
Although there are quite some steps and nuances involved, LR is worth adding to the standard upgrade toolbox for time-critical applications as it’s basically the best way to do major version upgrades nowadays - minimal dangers, minimal downtime!
FYI - if you're planning to migrate dozens of DB-s the LR upgrade process can be fully automated! Even starting from version 9.4 actually, with the help of the “pglogical” extension. So feel free to contact us if you might need something like that and don't particularly enjoy the details. Thanks for reading!
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Hello Kaarel. Thanks for this article.
Regarding sequence adjustments, it seems there is a slight problem in the statement. Did you mean :
psql -XAtqc "SELECT 'SELECT setval(' || quote_ident(schemaname) ||'.'|| quote_ident(sequencename) ||', '|| last_value || ');' AS sql FROM pg_sequences"
Marc
Thanks for the feedback, but it worked for me. Yours seems the same just without the dollar notation. Maybe some copy-paste problem?
Hello, i have a situation about logical replication slots that i do not understand.
On my cluster "cluster 1", when i try to create a logical replication slot:
SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'pgoutput');
ERROR: cannot create logical replication slot in transaction that has performed writes
so i stop everything running on it.
I create on the same server a 2nd cluster "cluster 2", with the same parameters.
postgres@stock(test)=# SELECT 'init' FROM pg_create_logical_replication_slot('slot3', 'pgoutput');
?column?
----------
init
(1 row)
a select * from pg_stat_activity shows the same on both (i am the only one connected)
i stop/start both clusters, i have always the same result. it is impossible to create a logical replciation slot (but is it possible using pg_recvlogical)
Do you have any ideas ?
Thanks
my conf is pgsql 12.2
Assuming you are using
psql
, perhaps there is something in your.psqlrc
that modifies data? You could setlog_statement = 'all'
for the user, try again and see in the log what statements are executed.nothing in .psqlrc, log_statement shows nothing....
i'm still searching.
executed with psql, on the same server, 2 identical clusters (except port number)
something i do not understand : my session starts a transaction:
# select * from pg_locks;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath
--------------- ---------- ---------- ------ ------- ------------ --------------- --------- ------- ---------- -------------------- ------- ----------------- --------- ----------
relation | 41044 | 12143 | | | | | | | | 47/910 | 24363 | AccessShareLock | t | t
virtualxid | | | | | 47/910 | | | | | 47/910 | 24363 | ExclusiveLock | t | t
transactionid | | | | | | 6979701 | | | | 47/910 | 24363 | ExclusiveLock | t | f
(3 rows)
24363 is my session....
i do not understand why i start a trx :
i connet to psql using
psql mydatabase
and then
select * from pg_locks;
shows a trx created by my pid.
ok, i have found the pb
it comes from extension pg_show_plans
Below query is not working for me
postgres=# SELECT
postgres-# quote_ident(nspname) || '.' || quote_ident(relname) AS tbl
postgres-# FROM
postgres-# pg_class c
postgres-# JOIN pg_namespace n ON c.relnamespace = n.oid
postgres-# WHERE
postgres-# relkind = 'r'
postgres-# AND NOT nspname LIKE ANY (ARRAY[E'pg\_%', 'information_schema'])
postgres-# AND NOT relhaspkey
postgres-# AND NOT EXISTS (SELECT * FROM pg_index WHERE indrelid = c.oid
postgres(# AND indisunique AND indisvalid AND indisready AND indislive)
postgres-# ORDER BY
postgres-# 1;
ERROR: column "relhaspkey" does not exist
LINE 9: AND NOT relhaspkey
^
postgres=# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 16.2 (Ubuntu 16.2-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
(1 row)
Tested on 11,15 and 16 versions
Thanks for the report. That column was removed in v11.
I have updated the query.
If I have application triggers in the source and these will get copied to the target using pg_dump and by default they will be in enabled in target.
So, before enabling the replication in target, do you recommend disabling the triggers in target to avoid the data corruption and then enable them once the cut-over is done?
Application triggers normally don't fire during logical replication, because logical replication sets
session_replication_role
toreplica
. So there is nothing you have to do. Only triggers for which you usedALTER TABLE ... ENABLE REPLICA|ALWAYS TRIGGER trigger_name
will fire during logical replication, but you only define such triggers if you want them to fire during logical replication.Thanks Laurenz for clarifying