Best of PostgreSQL 10 for the Developer

10.2017 / Category: News / Tags: performance

2 weeks ago a new PostgreSQL major version with the number 10 was released (note the new numbering scheme!). I already covered my favorite parts from the official release notes from a DBA’s point of view already here, so also take a look there, if you are interested in the operational side as well. In this blogpost, I would like to look at the list of changes from a developers angle (disclaimer - not doing too much development these days though, just keeping track of things). Of course, some features are shared with the DBA part, but there is still a lot of new stuff – enjoy!

Table of Contents

My favourite PostgreSQL 10 features from a Developer point of view

Logical replication

The long-awaited feature enables easy setup and minimal performance penalties for application scenarios, when you only want to replicate a single table or a subset of tables or all tables, enabling totally new application scenarios and enabling also zero downtime upgrades for following major versions! The top feature of version 10 for me.

Declarative partitioning

An old way of managing partitions via inheritance and creating triggers to re-route inserts to correct tables ,was bothersome to say the least, not to mention the performance impact. Currently supported are „range“ and „list“ partitioning schemes. If someone is missing „hash“ partitioning available in some DB engines, one could use „list“ partitioning with expressions to achieve the same.

Multiple hostname/IP support in client connection strings

Libpq will connect to the first responsive server in the list. This helps greatly in providing some basic high-availability to applications without any extra software! FYI – one needs to update the Postgres client libraries and possibly also your database driver of choice to make use of the feature.

Possibility to do simple read balancing between multiple hosts

Now there’s a new Libpq connect parameter called target_session_attrs, with possible values “read-write” and “any” so that by listing replicas first, in your connect string (see previous list item) and choosing “any”, one can do simple read balancing. For writes one needs another connection (according pools recommended) with “read-write”. That is not really transparent (transparent way possible with pgpool2 for example) but might be just good enough.

Commit tracking

For highly critical systems it is now possible to check, if the last transaction was committed, given that the txid was stored locally. For example, It can happen that the transaction was committed nicely and just the acknowledgement didn’t make it over the network to the client.

Quorum commit for Synchronous Replication

For highly critical systems it’s now possible to say that in addition to the connected server, the commit has to be propagated to any number of other servers. Previously the only option was a priority list, making managing server failures more difficult.

Cross-column optimizer statistics

Such stats need to be created manually on a set of columns of a table, to point out that the values are actually somehow dependent on each other. This will enable to counter slow query problems, where the planner thinks there will be very little data returned (multiplication of probabilities yields very small numbers usually) and will choose a „nested loop“ join for example, that does usually not perform too well on bigger amounts of data.

Usable Hash indexes

Hash indexes are now WAL-logged, thus crash-safe and ready for wider use. They also received some performance improvements for simple searches so that they should actually be faster than standard B-tree indexes for bigger amounts of data. Bigger index size too and only suitable for equality searches.

Transition tables to inspect all changed rows of a statement from a trigger

Quite an awesome feature - now it is possible to look at both, old and new data of the affected rows for a statement level trigger. This can be used to calculate some aggregates for example or to reject some updates if suspiciously too many rows were changed.

Allow the specification of a function name without arguments in DDL commands, if it is unique

Now one can drop a function based on name only. This is one of those small things that add up if you work a lot with stored procedures, as 99% of time the names are indeed unique. Less typing is good 🙂

Allow ENUM values to be renamed

This could be done previously only by hacking the system catalogs...which could bite you if not being careful.

Add full text search support for JSON and JSONB

In addition to some smaller JSONB stuff this addition once again assures us that JSONB is first class citizen in Postgres and mixed applications (NoSQL + SQL) have even more options.

file_fdw extension can now execute a program to read data

Postgres has long had the feature to treat simple files as virtual tables, but this feature could be a huge win for “warm data” / “cold data" scenarios where archive data grows too big but is occasionally still needed. When compressing Postgres data one can easily win 10-30x on disk size by the way. A sample from the Postgres Wiki:

CREATE FOREIGN TABLE
test(a int, b text)
SERVER csv
OPTIONS (program 'gunzip -c /tmp/data.czv.gz');

1

2

3

4

CREATE FOREIGN TABLE

test(a int, b text)

SERVER csv

OPTIONS (program 'gunzip -c /tmp/data.czv.gz');

That was that – step by next year for Postgres 11 !