by Kaarel Moppel
Table of Contents
Version 13 has been out there for over a month by now and the most important stuff has already been well digested... but luckily (or not) there’s so much non-major stuff in every release! This becomes quite evident when looking at the release notes with its 150 items of change or peeking at the great “Why Upgrade” website or doing direct Git archeology - 2233 changes in total between 12.4 and v13!
So in this post I’m highlighting some changes that stood out to me on the 2nd reading of the release notes. Note that I’m intentionally not touching the most prominent features like the new B-tree index format/deduplication, trusted extensions, and parallel VACUUM
that have gotten a lot of attention already.
So now when you pull a new basebackup (to build a standby or just to store for a while for PITR purposes) using the pg_basebackup
tool, by default unless explicitly overridden by the --no-manifest
parameter, a listing file with all fetched items together with their checksums will be stored in the target directory. And a new accompanying tool named pg_verifybackup
, when pointed to a base backup, can then use this listing file to verify file contents on the receiver side, removing the need, for example, to start the instance and dump out the contents to “/dev/null”.
From the details side - one can also tune the behavior a bit (various checksum algorithms, etc), but in practice, the most useful flags seem to be the --no-parse-wal
and --wal-directory
parameters. The first will skip checking the WAL at all which is useful if we pull snapshots for PITR purposes and don’t include WAL at all in our base backups (a standard thing to do in such case) and the WAL-s are actually pushed onto some S3/cloud sink and not just onto some NFS mount. For the latter case, there’s the second parameter - --wal-directory
.
All in all my favorite non-performance feature with this v13 release, helping to avoid some potential disasters over the years to come.
restore_command
to retrieve needed WALThe feature is enabled using the -c/--restore-target-wal
option and it basically executes the target instance's (the $DATADIR
being synced) restore_command
to fetch the recent WALs needed to detect and wipe any “lone changes” to the instance. A great feature that makes pg_rewind
a lot more usable - as previously you had to be typically pretty fast to arrive at the party as checkpoints recycle/throw away WALs pretty fast with default settings, meaning it was something more for HA automation tools like Patroni for example. Well, one could copy stuff manually, but...
And by the way, don’t forget though that you need to prepare for rewinding before you get into trouble in the first place - by enabling data checksums at init time (or now also possible later with pg_checksums
, in offline mode though) or setting the wal_log_hints
parameter.
Basically exactly the same -R/--write-recovery-conf
parameter available for pg_basebackup
for years, allowing to switch the rewinded node back to the HA cluster even more conveniently/faster. Makes perfect sense actually as this is the main purpose of using pg_rewind
anyways. From the technical side - this is achieved by writing the specified --source-server
connect string also into postgresql.auto.conf primary_conninfo
parameter.
This is one of the “parallelity” changes and a pretty nice addition for those instances where you have a dedicated downtime/maintenance slot and want to speed up bloat elimination without more complex scripts or techniques, like using 3rd party extensions like pg_repack
or pg_squeeze
. Parallel mode is enabled with the new --jobs
option. By the way, I normally always recommend to run this tool also not in default mode but limited to biggest/most active tables or whole schemas, using the --table
and --schema
flags.
--new-bindir
setting when running pg_upgradeThis is a very welcome update for me personally as we need to quite often connect to customer systems over some horribly slow and small web console emulators and the pg_upgrade
command is pretty much the only one that did not want to fit on a single line. I know, I know - there’s also an option to declare environment variables like PGDATANEW
but explicit is still better than implicit and I prefer having important stuff directly under my nose. So now something like that will be enough to do the final step of the upgrade:
1 |
/usr/lib/postgresql/13/bin/pg_upgrade --link -d pg12 -D pg13 -b /usr/lib/postgresql/12/bin/ |
dropdb
to disconnect sessions using the target database, allowing the drop to succeedThis is enabled with the -f
option. This feature got some attention already as basically it’s quite nice and convenient... but as it's also a footgun I'll probably avoid it for production operations. Also for production DBs I usually recommend not dropping the DB directly but just renaming it at first...
pgbench
to partition its “pgbench_accounts” tableThis time I’m really glad that my favorite satellite program coming with Postgres, the quick benchmarking tool pgbench
has received numerous updates. The most useful to me is the fact that now you can partition the main data table by providing just some flags - previously you had to do it in many steps, altering the schema manually and then specifying some --init-steps
again. Now 2 out of 3 built-in partitioning schemes are supported and testing out performance benefits of partitions (gaining parallel autovacuum most importantly) is easy as:
1 |
pgbench --initialize --scale=X --partition-method=hash --partitions=4 |
And another benefit when you’ve again forgotten about the exact syntax of declaring partitions and you quickly want to look it up without switching to a browser:
1 2 |
PGOPTIONS='-c log_statement=all -c client_min_messages=LOG' pgbench -i --partition-method=hash --partitions=2 -I dt |
--show-script
Another minor but nice time-saver - previously it was pretty tedious to check what kind of SQLs the builtin test modes actually generate. One had to enable the query log or go to the documentation or just run a single transaction with the -r
flag like pgbench -t1 -r -N
. But now it’s a bit more slick and explicit:
1 2 3 4 5 6 7 8 9 10 11 12 |
pgbench --show-script=simple-update -- simple-update: set aid random(1, 100000 * :scale) set bid random(1, 1 * :scale) set tid random(1, 10 * :scale) set delta random(-5000, 5000) BEGIN; UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; SELECT abalance FROM pgbench_accounts WHERE aid = :aid; INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP); END; |
So this change exploits the very useful generate_series() function (that everyone using Postgres should know and master!) to generate the data on the fly, so basically only 1 INSERT
statement is used instead of the old COPY
+ data streamed over. On the look of it “server-side” sounded faster to me, but when doing a small test with a subnet co-located server I actually saw that it was about 2x slower! I assume it’s about INSERT
having a larger overhead than COPY
and in short seems it makes sense to use this new flag only when you really want to reduce the network traffic or the target server is really far away from you - then you would probably see some speedup.
1 2 3 4 5 6 7 8 9 10 11 12 |
$# pgbench -i -s1000 -q -I dtG -h testkast -p5432 dropping old tables... creating tables... generating data (server-side)... done in 363.64 s (drop tables 0.02 s, create tables 0.02 s, server-side generate 363.60 s). $# pgbench -i -s1000 -q -I dtg -h testkast -p5432 dropping old tables... creating tables... generating data (client-side)... 100000000 of 100000000 tuples (100%) done (elapsed 194.81 s, remaining 0.00 s) done in 197.27 s (drop tables 0.21 s, create tables 0.01 s, client-side generate 197.05 s). |
max_slot_wal_keep_size
Replication slots exceeding this value are marked invalid and the extra WALs above this threshold will be still recycled/deleted after a CHECKPOINT when they’re not anymore needed for data consistency reasons. A great little addition to make running flaky replicas safer - a golden mean of sorts between the rigid wal_keep_segments
and the unforgiving replication slots.
To finish off - also a couple of automatic/transparent changes where users don’t need to lift a finger to start benefiting. Well, except for upgrading in the first place of course 🙂
This might sound very obscure at first to newer Postgres developers... as I bet they’re pretty sure that they’re not using any TOAST features 🙂 And indeed it’s some automatic behind-the-scenes “magic” to optimize storage for large column values that you don’t need to care about mostly.
But with this change some very typical workloads like substring matching/filtering for large string values which Postgres typically decides to move to TOAST and additionally compress, are now easily 10x (and even more) faster! That’s way above the typical incremental speedups that we usually get for most features when upgrading.
Another huge win that needs an example probably. So this addition is basically taking partial ordering of first index columns into account so that you can still benefit from an index even if the 2nd or 3rd ORDER BY
column of your query is not included in the index - so in short an “index scan” vs “sequential scan” situation, which depending on the table size can yield wins of orders of magnitude as the table size grows. My testing with a small 1 million rows tables that fitted fully into “shared buffers” showed a 120x speedup for example!
1 2 3 4 |
CREATE UNLOGGED TABLE test(a int, b int); INSERT INTO test SELECT i, i FROM generate_series(1, 1e6) i; CREATE INDEX ON test (a); EXPLAIN ANALYZE SELECT * from test where a < 1000 order by a, b; |
This is controlled by the new logical_decoding_work_mem
parameter, with the default value of 64MB and a minimal value of 64kB. This is a very nice addition to those (relatively few) that use Logical Replication but are having performance issues... which happens sadly quite often when not running a pure OLTP-style workload as the feature is not really optimized yet for bulk data operations. Previously it was hard-coded to spill the reorder buffer to disk after only 4096 row changes for example. By the way, the next major version is set to improve the bulk operations performance furthermore with streaming of large in-progress transactions - so definitely waiting for that.
Left this one to the end so that you’d hopefully remember it more likely... as this one is potentially a breaking change (in addition to the list of things at the top of the release notes) if you use the CSVLOG log destination and use some more sophisticated log analyzing techniques other than “grep” to make sense of the information.
The new column added to the list of previous ones is called backend_type
and is of data type text
. But yes, seems that it could have been highlighted more prominently in the release notes as if it caught the most popular Postgres log parsing tool called PgBadger off guard, it will probably hit also some other users. Lots of people actually load their database logs into some tables to power some dashboards or alert because it’s really easy to do so actually! Check the documentation on the “file_fdw” for example if you want to see how.
Thanks for reading and see you hopefully in a year for similar reasons 🙂
Find out the latest PostgreSQL performance tuning expert advice among our performance blogs!
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Thanks again, I always read your posts with great attention.
In the "Implement incremental sorting" example, I think there should be an ANALYZE test before EXPLAIN. This changes the plan.