CYBERTEC PostgreSQL Logo

Upgrading Postgres major versions using Logical Replication

10.2019 / Category: / Tags: |

By Kaarel Moppel - Some weeks ago, in the light of PostgreSQL v12 release, I wrote a general overview on various major version upgrade methods and benefits of upgrading in general - so if upgrading is a new thing for you I’d recommend to read that posting first. But this time I’m concentrating on the newest (available since v10) and the most complex upgrade method - called “Logical Replication” or LR shortly. For demonstration purposes I’ll be migrating from v10 to freshly released v12 as this is probably the most likely scenario. But it should work the same also with v11 to v12. But do read on for details.

Benefits of LR upgrades

First a bit of recap from the previous post on why would you use LR for upgrading at all. Well, in short - because it’s the safest option with shortest possible downtime! With that last point I’m already sold...but here again the list of “pros” / “cons”:

PROS

  • Minimal downtime required

After the initial setup burden one just needs to wait (and verify) that the new instance hast all the data from the old one...and then just shut down the old instance and point applications to the new instance. Couldn’t be easier!

Also before the switchover one can make sure that statistics are up to date, to minimize the typical “degraded performance” period seen after "pg_upgrade" for more complex queries (on bigger databases). For high load application one could even be more careful here and pull the most popular relations into shared buffers by using the (relatively unknown) “pg_prewarm” Contrib extension or by just running common SELECT-s in a loop, to counter the “cold cache” effect.

  • Flexible

One can for example already make some changes on the target DB – add columns / indexes, change datatypes, leave out some old archive tables, etc. The general idea is that LR does not work on the binary, 1-to-1 level as”pg_upgrade” does, but rather JSON-like data objects are sent over to another master / primary instance, providing quite some freedom on the details.

  • Safe

Before the final switchover you can anytime abort the process and re-try if something seems fishy. The old instances data is not changed in any way even after the final switchover! Meaning you can easily roll back (with cost of some data loss typically though) to the old version if some unforeseen issues arise. One should only watch out for the replication slot on the source / publisher DB if the target server just taken down suddenly.

CONS

  • Quite a few steps to take and possibly one needs to modify the schema a bit.
  • Always per DB.
  • Could take a long time for big databases.
  • Large objects, if in use (should be a thing of the past really), need to be exported / imported manually.

Preparing for LR

As LR has some prerequisites on the configuration and schema, you’d first need to see if it’s possible to start with the migration process at all or some changes are needed on the old master node, also called the “publisher” in LR context.

Action points:

1) Enable LR on the old master aka subscriber aka source DB if not done already. This means setting “wal_level” to “logical” in postgresql.conf and making sure that “replication” connections are allowed in “pg_hba.conf” from the new host (also called the “subscriber” in LR context). FYI - changing “wal_level” needs server restart! To enable any kind of streaming replication some other params are needed but they are actually already set accordingly out of the box as of v10 so it shouldn’t be a problem.

2) Check that all tables have a Primary Key (which is good database design anyways) or alternatively have REPLICA IDENTITY set. Primary Keys don’t need much explaining probably but what is this REPLICA IDENTITY thing? A bit simplified - basically it allows to say which columns formulate uniqueness within a table and PK-s are automatically counted as such.

3) If there’s no PK for a particular table, you should create one, if possible. If you can’t do that, set unique constraints / indexes to serve as REPLICA IDENTITY, if at all possible. If even that isn’t possible, you can set the whole row as REPLICA IDENTITY, a.k.a. REPLICA IDENTITY FULL, meaning all columns serve as PK’s in an LR context - with the price of very slow updates / deletes on the subscriber (new DB) side, meaning the whole process could take days or not even catch up, ever! It’s OK not to define a PK for a table, as long as it’s a write-only logging table that only gets inserts.

Sample code:

Fresh setup of the new “subscriber” DB

Second most important step is to set up a new totally independent instance with a newer Postgres version - or at least create a new database on an existing instance with the latest major version. And as a side note - same version LR migrations are also possible, but you'd be solving some other problem in that case.

This step is actually very simple - just a standard install of PostgreSQL, no special steps needed! With the important addition that to make sure everything works exactly the same way as before for applications - same encoding and collation should be used!

NB! Before the final switchover it’s important that no normal users have access to the new DB - as they might alter table data or structures and thereby inadvertently produce replication conflicts that mostly mean starting from scratch (or a costly investigation / fix) as “replay” is a sequential process.

Schema / roles synchronization

Next we need to synchronize the old schema onto the new DB as Postgres does not take care of that automatically as of yet. The simplest way is to use the official PostgreSQL backup tool called “pg_dump”, but if you have your schema initialization scripts in Git or such and they're up to date then this is fine also. For syncing roles “pg_dumpall” can be used.

NB! After this point it’s not recommended to introduce any changes to the schema or be at least very careful when doing it, e.g. creating new tables / columns first on the subscriber and refreshing the subscriptions when introducing new tables - otherwise data synchronization will break! Tip - a good way to disable unwanted schema changes is to use DDL triggers! An approximate example on that is here. Adding new tables only on the new DB is no issue though but during an upgrade not a good idea anyways - my recommendation is to first upgrade and then to evolve the schema.

Create a “publication” on the old DB

If preparations on the old DB has been finished (all tables having PK-s or replication identities) then this is a oneliner:

Here we added all (current and those added in future) tables to a publication (a replication set) named “upgrade” but technically we could also leave out some or choose to only replicate some operations like UPDATE-s, but for a pure version upgrade you want typically all.

NB! As of this moment the replication identities become important - and you might run into trouble on the old master if the identities are not in place on all tables that get changes! In such case you might see errors like that:

Create a “subscription” on the target DB

Next step - create a “subscription” on the new DB. This is also a oneliner, that creates a logical replication slot on the old instance, pulls initial table snapshots and then starts to stream and apply all table changes as they happen on the source, resulting eventually in a mirrored dataset! Note that currently superuser rights are needed for creating the subscription and actually hit also makes life a lot easier on the publisher side.

WARNING! As of this step the 2 DB-s are “coupled” via a replication slot, carrying some dangers if the process is aborted abruptly and the old DB is not “notified” of that. If this sounds new please see the details from documentation.

Check replication progress

Depending on the amount of data it will take X minutes / days until everything is moved over and “live” synchronization is working.

Things to inspect for making sure there are no issues:

  • No errors in server logs on both sides
  • There’s an active “pg_replication_slots” entry on the master with the name that we used to create the “subscription” on the new DB
  • All tables are actively replicating on the subscriber side, i.e. “pg_subscription_rel.srsubstate” should be ‘r’ for all tables (ready - normal replication)

Basic data verification / switchover preparation

Although not a mandatory step, when it comes to data consistency / correctness, it always makes sense to go the extra mile and run some queries that validate that things (source - target) have the same data. For a running DB it’s of course a bit difficult as there’s always some replication lag but for “office hours” applications it should make a lot of sense. My sample script for comparing rowcounts (in a non-threaded way) is for example here but using some slightly more "costly" aggregation / hashing functions that really look at all the data would be even better there.

Also important to note if you’re using sequences (which you most probably are) - sequence state is not synchronized by LR and needs some manual work / scripting! The easiest option I think is that you leave the old DB ticking in read-only mode during switchover so that you can quickly access the last sequence values without touching the indexes for maximum ID-s on the subscriber side.

Switchover time!

We’re almost there with our little undertaking...with the sweaty part remaining - the actual switchover to start using the new DB! Needed steps are simple though and somewhat similar to switching over to a standard, “streaming replication” replica.

1) Re-check the system catalog views on replication status.

2) Stop the old instance.

Make sure it’s a nice shutdown. The last logline should state “database system is shut down”, meaning all recent changes were delivered to connected replication clients, including our new DB. Start of downtime! PS Another alternative to make sure absolutely all data is received is to actually configure the new instance in “synchronous replication” mode! This has the usual synchronous replication implications of course so I’d avoid it for bigger / busier DBs.

3) Start the old DB in read-only mode by creating a recovery.conf file

From v12 this is achieved by declaring a "standby.signal" file

4) Optionally make some more quick “health checks”

if time constraints allow it - verify table sizes, row counts, your last transactions, etc. For “live” comparisons it makes sense to restart the old DB under a new, random port so that no-one else connects to it.

5) Synchronize the sequences.

Given we'll leave the old DB in read-only mode the easiest way is something like that:

6) Reconfigure

your pg_hba.conf to allow access for all "mortal" users, then reconfigure your application, connection pooler, DNS or proxy to start using the new DB! If the two DB-s were on the same machine then it’s even easier - just change the ports and restart. End of downtime!

7) Finish

Basically we’re done here, but would be nice of course to clean up and remove the (no-more needed) subscription not to accumulate errors in server log.

Note that if you won't keep the old “publisher” accessible in read-only or normal primary mode (dangerous!) though, some extra steps are needed here before dropping:

8) Time for some bubbly drinks 🙂 

Summary

Although there are quite some steps and nuances involved, LR is worth adding to the standard upgrade toolbox for time-critical applications as it’s basically the best way to do major version upgrades nowadays - minimal dangers, minimal downtime!

FYI - if you're planning to migrate dozens of DB-s the LR upgrade process can be fully automated! Even starting from version 9.4 actually, with the help of the “pglogical” extension. So feel free to contact us if you might need something like that and don't particularly enjoy the details. Thanks for reading!

11 responses to “Upgrading Postgres major versions using Logical Replication”

  1. Hello Kaarel. Thanks for this article.

    Regarding sequence adjustments, it seems there is a slight problem in the statement. Did you mean :

    psql -XAtqc "SELECT 'SELECT setval(' || quote_ident(schemaname) ||'.'|| quote_ident(sequencename) ||', '|| last_value || ');' AS sql FROM pg_sequences"
    Marc

    • Thanks for the feedback, but it worked for me. Yours seems the same just without the dollar notation. Maybe some copy-paste problem?

  2. Hello, i have a situation about logical replication slots that i do not understand.
    On my cluster "cluster 1", when i try to create a logical replication slot:

    SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'pgoutput');
    ERROR: cannot create logical replication slot in transaction that has performed writes

    so i stop everything running on it.
    I create on the same server a 2nd cluster "cluster 2", with the same parameters.

    postgres@stock(test)=# SELECT 'init' FROM pg_create_logical_replication_slot('slot3', 'pgoutput');
    ?column?
    ----------
    init
    (1 row)

    a select * from pg_stat_activity shows the same on both (i am the only one connected)

    i stop/start both clusters, i have always the same result. it is impossible to create a logical replciation slot (but is it possible using pg_recvlogical)

    Do you have any ideas ?
    Thanks
    my conf is pgsql 12.2

    • Assuming you are using psql, perhaps there is something in your .psqlrc that modifies data? You could set log_statement = 'all' for the user, try again and see in the log what statements are executed.

      • nothing in .psqlrc, log_statement shows nothing....
        i'm still searching.
        executed with psql, on the same server, 2 identical clusters (except port number)

        something i do not understand : my session starts a transaction:
        # select * from pg_locks;
        locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath
        --------------- ---------- ---------- ------ ------- ------------ --------------- --------- ------- ---------- -------------------- ------- ----------------- --------- ----------
        relation | 41044 | 12143 | | | | | | | | 47/910 | 24363 | AccessShareLock | t | t
        virtualxid | | | | | 47/910 | | | | | 47/910 | 24363 | ExclusiveLock | t | t
        transactionid | | | | | | 6979701 | | | | 47/910 | 24363 | ExclusiveLock | t | f
        (3 rows)

        24363 is my session....

        i do not understand why i start a trx :
        i connet to psql using
        psql mydatabase
        and then
        select * from pg_locks;
        shows a trx created by my pid.

  3. Below query is not working for me

    postgres=# SELECT
    postgres-#   quote_ident(nspname) || '.' || quote_ident(relname) AS tbl
    postgres-# FROM
    postgres-#   pg_class c
    postgres-#   JOIN pg_namespace n ON c.relnamespace = n.oid
    postgres-# WHERE
    postgres-#   relkind = 'r'
    postgres-#   AND NOT nspname LIKE ANY (ARRAY[E'pg\_%', 'information_schema'])
    postgres-#   AND NOT relhaspkey
    postgres-#   AND NOT EXISTS (SELECT * FROM pg_index WHERE indrelid = c.oid
    postgres(#       AND indisunique AND indisvalid AND indisready AND indislive)
    postgres-# ORDER BY
    postgres-#   1;
    ERROR: column "relhaspkey" does not exist
    LINE 9:   AND NOT relhaspkey
              ^

    postgres=# select version();
                                   version
    -----------------------------------------------------------------------------------------------------------------------------------
     PostgreSQL 16.2 (Ubuntu 16.2-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
    (1 row)

    Tested on 11,15 and 16 versions

  4. If I have application triggers in the source and these will get copied to the target using pg_dump and by default they will be in enabled in target.
    So, before enabling the replication in target, do you recommend disabling the triggers in target to avoid the data corruption and then enable them once the cut-over is done?

    • Application triggers normally don't fire during logical replication, because logical replication sets session_replication_role to replica. So there is nothing you have to do. Only triggers for which you used ALTER TABLE ... ENABLE REPLICA|ALWAYS TRIGGER trigger_name will fire during logical replication, but you only define such triggers if you want them to fire during logical replication.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram