By Kaarel Moppel
Table of Contents
Version 12 of PostgreSQL is not exactly fresh out of the oven, as the first minor release was already announced. However, I think it’s fair to say that this version can be still considered fresh for most users, and surely only a small percentage of users has upgraded. So I think it makes sense to go over some new features. This will be my last article beating on the v12 drum though, I promise 🙂
As usual, there have already been quite a few articles on the planet.postgresql.org feed on that topic, so I’ll try to cover things from another angle and not only concentrate on the main features of PostgreSQL version 12. In general, though, this release was more of an “infrastructure” release, and maybe not so exciting as v11 when it comes to cool new features (with some exceptions)-- however, do take a look as there are hundreds of small changes in every release, so I’m sure you’ll discover something new for yourself. Full release notes can be found here - https://www.postgresql.org/docs/12/release-12.html
Although on average, this release might not be the fastest one, due to some infrastructure changes which also provide other benefits, there are some areas where you’ll see huge performance boosts out of the box, without having to do anything:
This should provide a performance boost for 99% of use cases – and for those rare exceptions where you don’t want to benefit from filter pushdowns and index scans, (for example, in the rare cases where extreme bloat or planner row count misjudgements are a problem) you can override it with the “ MATERIALIZED” keyword, e.g.:
1 2 3 4 |
WITH w AS MATERIALIZED ( SELECT * FROM pgbench_accounts ) SELECT * FROM w WHERE aid = 1; |
Serializable isolation mode is quite rare in the wild due to the resulting performance penalties, but technically, it is the simplest way to fix crappy applications, not taking all parallel activities properly into account - and now it has been much improved for larger amounts of data! Note that for low CPU machines, I usually recommend increasing the default parallelization thresholds (min_parallel_index_scan_size / min_parallel_table_scan_size) a bit. The reason is that both creating the OS processes and syncing incur costs, and it could actually be slower in the end for smallish datasets.
Introduced in v11, this was the optional “killer feature” for those implementing Data Warehouses on Postgres, with up to 50% boosts. Since there were no big problems found with its quite complex functionality, it’s now enabled by default and also kicks in a bit more often. A disclaimer seen at this year’s pgConf.EU though – in some rare cases, for very complex queries, the JIT code generation + optimized execution can take more time than the plain non-optimal execution! Luckily, it’s a “user”-level feature, and can be disabled per session / transaction. Alternatively, you could also tune the “jitting” threshold family of parameters (jit_*_cost), so that small / medium datasets won’t use them.
Like the section title says, this feature is an implementation of the SQL standard and allows easy selection and filtering of specific elements of JSON objects, for example, in order to create joins in a more standard fashion. Huge amounts of new structures / functions are now available, by the way, so only the most basic sample is shown below. For more info, check the docs - https://www.postgresql.org/docs/12/functions-json.html#FUNCTIONS-SQLJSON-PATH
1 2 3 4 5 6 7 |
krl@postgres=# select jsonb_path_query( '{'results': [{'id': 10, 'name': 'Frank'}, {'id': 11, 'name': 'Annie'}] }'::jsonb, '$.results[*].id' ); jsonb_path_query ────────────────── 10 11 (2 rows) |
With this feature, as of v12, the partitioning functionalities of PostgreSQL could be called complete. There’s a small gotcha with this FK functionality though, limiting its usefulness in some cases: the column being referenced must be a part of the partitioning key! Sadly, this does not play too nice with typical date-based partitioning scenarios, so you’ll need to resort to the usual hacks, like custom validation triggers, or regularly running some FK checking scripts.
The new pg_partition_root(), pg_partition_ancestors() and pg_partition_tree() functions do what they sound like (well, the pg_partition_tree() doesn’t visualize a real tree, but just a table) and are a real boon, especially for those who are doing multi-level partitioning.
Not something for everyone, but it’s a useful new way to signal from the client side that you’re on a bad network, or there’s some overly-aggressive firewall somewhere in between; you will want to de-allocate session resources as soon as possible, if something has gone sour. There are also some server-side “keepalive” params dealing with the same topic but not exactly the same thing; those parameters are for idle connections, and things are measured in seconds there, not in milliseconds.
This is perfect for beginners who might need a bit more background information on some commands, as the default “help” is quite terse (on purpose, of course!). Especially cheering – the feature was added based on my suggestion / complaint 🙂 Thanks guys, you’re awesome!
As per the documentation - disabling index cleanup can speed up VACUUM significantly (the more indexes you have, the more noticeable it is) – so it’s perfect for cases where you need to quickly roll out some schema changes and are willing to accept the penalty of more bloated indexes if there’s a lot of simultaneous UPDATE activity.
This is a nice addition for those urgent troubleshooting cases where it’s easy to miss the obvious, due to some mental pressure. The Git log entry says it all: Query planning is affected by a number of configuration options, and it may be crucial to know which of those options were set to non-default values. With this patch, you can say EXPLAIN (SETTINGS ON) to include that information in the query plan. Only those options which affect planning and contain values different from the built-in default are printed.
The new “log_transaction_sample_rate” does the above. It’s great for those troubleshooting cases where you want to get real, executable, SQL samples from logs (pg_stat_statements already works with parameterized statements) but don’t want to negatively affect the performance or fill the logs. By the way, if you’re stuck on some lower Postgres version and can’t upgrade in the near future – try the “auto_explain” extension’s “sample_rate” for effectively the same functionality (with some more verbose EXPLAIN output though) as weirdly enough, it has already been there for years.
Finally! The previous default’s recovery_target_timeline setting of “current” was a major stumbling stone for beginners making use of PostgreSQL’s “cascading replication” feature and caused quite a few support inquiries for me over the years. It now defaults to “latest,” as it should be.
REINDEX CONCURRENTLY was one of the “headline” features of the v12 release, a well-deserved honor: similar behaviour previously required quite a lot of scripting and was thus very tedious and also somewhat dangerous (non-critically).
PostgreSQL-side checksums serve as an “early warning” system for detecting silent data corruption (only for the blocks being touched, though) and should actually almost always be enabled . Previously, once missed on initialization, it could not be fixed later on. Well, only via dump / reload or upgrading via logical replication...which might not always be desirable or doable. So, this is a very welcome addition.
How big of a downtime are we talking about here? Not too much – for example, for my 64 GB test instance, it took 5 minutes. But I think there are some plans to improve it further to support “live” transitions. And making the process parallel would also of course help, currently it seems to be single threaded, as it could deplete neither my CPU nor the IO subsystem of my SSD. For the “live” scenario it wouldn’t matter too much, though. It would also be nice if the utility could handle some older Postgres versions - when I tried it, it bailed with “error: pg_control CRC value is incorrect” for v11 for example. Well, we can’t have it all, I guess.
1 2 3 4 5 6 7 8 9 10 11 |
/usr/lib/postgresql/12/bin/pg_checksums --enable --progress -v -D /var/lib/postgresql/12/main/ .... pg_checksums: checksums enabled in file '12/main//base/13397/2836' 64063/64063 MB (100%) computed Checksum operation completed Files scanned: 1035 Blocks scanned: 8200102 pg_checksums: syncing data directory pg_checksums: updating control file Data checksum version: 1 Checksums enabled in cluster |
Previously, you had to rely solely on the correct usage of the “default_tablespace” parameter, so this feature will make the DBA’s life a lot easier, once you’re at a scale where you really need some tablespace management. For the “transitioning” Oracle DBA’s – you don’t need or want any extra tablespaces “just for fun” in PostgreSQL, it will only make your life more complex than needed.
This is one of my favourite ones, me being a huge user of pgbench in general. So what does it allow me to do? For example, it allows me to quickly simulate some locking scenarios, by using only a subset of pgbench-generated accounts, or to force some kind of ordering to account updates. The pgbench scripting language command for this is “gset” and the code would look something like the below:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
set bid random(1, 1 * :scale) set tid random(1, 10 * :scale) set delta random(-5000, 5000) BEGIN; select aid from pgbench_accounts where abalance = 0 order by aid limit 1 gset UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; SELECT abalance FROM pgbench_accounts WHERE aid = :aid; UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP); END; |
For example, SET work_mem = '1.5GB' is now allowed. Small code change, but a lot of convenience.
This could be a real lifesaver for huge databases where Autovacuum was kept unchecked for a long period of time, and the wraparound horizon is nearing (WARNING messages will pop up in the logs modified) and some quick manual reaction is needed to avoid effective downtime.
To finish up, I’ll repeat what it says in the release notes, and also mention the huge breaking change in replication management, in case you’ve managed to forget – “recovery.conf” files are dead, gone, history! And the related settings now reside in the main server conf (postgresql.conf / postgresql.auto.conf). This means updating your replication scripts if you’re not using any 3rd party utilities to take care of that, but on the upside - less files, and it’s now possible to inspect those settings from within Postgres sessions.
Hope you found something new and useful here!
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
"It would also be nice if the utility could handle some older Postgres
versions – when I tried it, it bailed with “error: pg_control CRC value
is incorrect” for v11 for example"
The pg_checksums version at https://github.com/credativ/pg_checksums/ is ported to all supported versions of PostgreSQL and also available as Debian/Ubuntu package on apt.postgresql.org.
For statement sampling, there is this extension for previous Postgres version : https://github.com/anayrat/pg_sampletolog