By Kaarel Moppel
Table of Contents
With the latest major version released, it's time to assess its performance. I've been doing this for years, so I have my scripts ready—it's more work for the machines. The v12 release introduces several enhancements, including a framework for changing storage engines, improved partitioning, smaller multi-column and repetitive value indexes, and concurrent re-indexing. These updates should boost performance for OLAP/Warehouse use cases, though OLTP improvements may be limited. Smaller indexes will help with large datasets. For detailed performance insights, check the summary table at the end or read further.
By the way - from a code change statistics point of view, the numbers look quite similar to the v11 release, with the exception that there is an increase of about 10% in the number of commits. This is a good sign for the overall health of the project, I think 🙂
1 2 3 4 5 6 7 |
git diff --shortstat REL_11_5_STABLE..REL_12_0 3154 files changed, 317813 insertions(+), 295396 deletions(-) git log --oneline REL_11_5..REL_12_0 | wc -l 2429 |
I decided to run 4 different test cases, each with a couple of different scale / client combinations. Scales were selected 100, 1000 and 5000 – with the intention that with 100, all data fits into shared buffers (which should provide the most accurate results), 1000 means everything fits into RAM (Linux buffer cache) for the used hardware and 5000 (4x RAM) to test if disk access was somehow improved. A tip - to quickly get the right “scaling factor” numbers for a target DB size, one could check out this post here.
The queries were all tied to or derived from the default schema generated by our old friend pgbench, and the 4 simulated test scenarios were as follows:
1) Default pgbench test mimicking an OLTP system (3x UPDATE, 1x SELECT, 1x INSERT)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
set aid random(1, 100000 * :scale) set bid random(1, 1 * :scale) set tid random(1, 10 * :scale) set delta random(-5000, 5000) BEGIN; UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; SELECT abalance FROM pgbench_accounts WHERE aid = :aid; UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP); END; |
2) Default pgbench with a slightly modified schema - I decided to add a couple of extra indexes, since the release notes promised more effective storage of indexes with many duplicates and also for multi-column (compound) indexes, which surely has its cost. Extra index definitions were following:
1 2 3 4 5 6 7 |
CREATE INDEX ON pgbench_accounts (bid); CREATE INDEX ON pgbench_history (tid); CREATE INDEX ON pgbench_history (bid, aid); CREATE INDEX ON pgbench_history (mtime); |
3) pgbench read-only (--select-only flag)
1 2 3 |
set aid random(1, 100000 * :scale) SELECT abalance FROM pgbench_accounts WHERE aid = :aid; |
4) Some home-brewed analytical / aggregate queries that I also used for testing the previously released PostgreSQL version, v11. The idea here is not to work with single records, but rather to scan over all tables / indexes or a subset on an index.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
/* 1st let’s create a copy of the accounts table to be able to test massive JOIN-s */ CREATE TABLE pgbench_accounts_copy AS SELECT * FROM pgbench_accounts; CREATE UNIQUE INDEX ON pgbench_accounts_copy (aid); /* Query 1 */ SELECT avg(abalance) FROM pgbench_accounts JOIN pgbench_branches USING (bid) WHERE bid % 2 = 0; /* Query 2 */ SELECT COUNT(DISTINCT aid) FROM pgbench_accounts; /* Query 3 */ SELECT count(*) FROM pgbench_accounts JOIN pgbench_accounts_copy USING (aid) WHERE aid % 2 = 0; /* Query 4 */ SELECT sum(a.abalance) FROM pgbench_accounts a JOIN pgbench_accounts_copy USING (aid) WHERE a.bid % 10 = 0; |
In order to get more reliable results this time, I used a dedicated old workstation machine that I have lying around as test hardware, instead of a cloud VM . . The machine is rather low-spec, but it at least has an SSD, so for comparing differences it should be totally fine. I actually did also try to use a cloud VM first (with a dedicated CPU), but the runtimes I got were so unpredictable on different runs of the same PG version that it was impossible to say anything conclusive.
CPU: 4x Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
RAM: 16GB
DISK: 256GB SSD
OS: Clean server install of Ubuntu 18.04 with some typical kernel parameter tuning for databases:
hdparm -W0 /dev/sda # disable write caching
vm.swappiness=1 # minimal swapping
vm.overcommit_memory=2 # no overcommit
vm.overcommit_ratio=99
For most tests, the disk parameters should not matter here either, as most used queries are quite read-heavy and data fits into shared buffers or RAM for most tests. This is actually what you want for an algorithmical comparison.
For running Postgres, I used the official Postgres project maintained [repo](https://wiki.postgresql.org/wiki/Apt) to install both 11.5 and 12.0. Concerning server settings (postgresql.conf), I left everything to defaults, except a couple of changes (see below) on both clusters, for reasons described in the comments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
shared_buffers='2GB' # to ensure buffers our dataset work_mem='1GB' # increasing work_mem is common and desired for analytical queries maintenance_work_mem=’2GB’ # to speed up `pgbench –init` plus autovacuum_vacuum_cost_delay = ‘20ms’ # to grant equality as v12 has it on 2ms random_page_cost = 2 # 1.5 to 2 is a good value for slower SSD-s jit=on # v11 didn’t have it enabled by default shared_preload_libraries='pg_stat_statements' # for storing/analyzing test results logging_collector=on # helps slightly with performance when issuing lot of log messages |
After running my test script for a couple of days (2h for each test case / scale / client count / PG version variation) on both clusters, here are the numbers for you to evaluate.
The scripts I used, doing all the hard work for me, can be found here by the way. For generating the percentage differences, you could use this query.
NB! Results were extracted directly from the tested PostgreSQL instances via the “pg_stat_statements” extension, so that the communication latency included in the pgbench results is not taken into account. Also note that the measured query “mean_times” are an average over various “client” parallelism counts:1-4 depending on the test. With analytical queries where “parallel query” kicks in-- with up to 2 extra workers per query-- it does not make sense to use more than $CPU_COUNT / 3 clients as some CPU starvation will otherwise occur (at default parallelism settings).
Testcase | Scale | v11 avg. “mean time” (ms) | v12 avg. “mean_time” (ms) | Mean time diff change | v11 “stddev” | v12 “stddev” | Stddev diff change |
pgbench read-only | Buffers | 0.0056 | 0.0058 | 3.6 % | 0.002 | 0.002 | 0 % |
RAM | 0.0605 | 0.0594 | -2.2 % | 0.4 | 0.4314 | 7.9 % | |
Disk | 0.3583 | 0.3572 | -0.3 % | 0.6699 | 0.6955 | 3.8 % | |
pgbench default UPDATE | Buffers | 0.0196 | 0.0208 | 6.1 % | 0.0063 | 0.0065 | 3.2 % |
RAM | 0.1088 | 0.109 | 0.2 % | 0.7573 | 0.8156 | 7.7 % | |
pgbench default INSERT | Buffers | 0.034 | 0.0408 | 20 % | 0.008 | 0.0222 | 177.5 % |
RAM | 0.0354 | 0.0376 | 6.2 | 0.0109 | 0.0105 | -3.7 | |
pgbench extra indexes UPDATE | Buffers | 0.0203 | 0.021 | 3.4 | 0.0067 | 0.0069 | 3 |
RAM | 0.1293 | 0.1283 | -0.8 | 1.4285 | 1.1586 | -18.9 | |
pgbench extra indexes INSERT | Buffers | 0.0447 | 0.0502 | 12.3 | 0.0111 | 0.0249 | 124.3 |
RAM | 0.0459 | 0.0469 | 2.2 | 0.0136 | 0.0152 | 11.8 | |
Analytical Query 1 | Buffers | 551.81 | 556.07 | 0.8 | 39.76 | 33.22 | -16.5 |
RAM | 2423.13 | 2374.30 | -2 | 54.95 | 51.56 | -2 | |
Analytical Query 2 | Buffers | 996.08 | 984.09 | -1.2 | 7.74 | 4.62 | -40.2 |
RAM | 7521.1 | 7589.25 | 0.9 | 62.29 | 73.67 | 18.3 | |
Analytical Query 3 | Buffers | 2452.23 | 2655.37 | 8.3 | 87.03 | 82.16 | -5.6 |
RAM | 16568.9 | 16995.2 | 2.6 | 140.10 | 138.78 | -0.9 | |
Analytical Query 4 | Buffers | 876.72 | 906.13 | 3.4 | 41.41 | 37.49 | -9.5 |
RAM | 4794.99 | 4763.72 | -0.7 | 823.60 | 844.12 | 2.5 | |
Total avg. | 3.3 | 13.8 |
* pgbench default UPDATE means UPDATE on pgbench_accounts table here. pgbench_branches / tellers differences were discardable and the table would get too big.
As with all performance tests, skepticism is essential, as accurate measurement can be challenging. In our case, some SELECT queries averaged under 0.1 milliseconds, but larger standard deviations indicate randomness in tight timings. Additionally, the pgbench test is somewhat artificial, lacking real-world bloat and using narrow rows. I’d appreciate feedback if you've seen significantly different results or if my test setup seems reasonable. Interestingly, for the first time in years, average performance decreased slightly—by 3.3% for mean times and 13.8% for deviations. This could be expected since the storage architecture was revamped for future engine compatibility.
Also as expected, the new more space-saving index handling took the biggest toll on v12 pgbench INSERT performance (+10%)...but in the long run, of course, it will pay for itself nicely with better cache ratios / faster queries, especially if there are a lot of duplicate column values.
Another thing we can learn from the resulting query “mean times” (and this is probably even more important to remember than the 3% difference!) is that there are huge differences - roughly an order of magnitude - whether data fits into shared buffers vs memory, or memory vs disk. Thus the end with an obvious conclusion - for the best performance, investing in ample memory makes much more sense than a version upgrade!
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
I suggest running the benchmark with "-M prepared" to get a less noisy figure. The overhead of parsing and planning is quite noticeable with pgbench these days.
Thanks for the tip! I've used this -M flag when stress testing disks as IO. Hmm, would be interesting to know if real-world application use on average more prepared statements or plain protocol. In Java world probably more prepared statements.
How do you know if the ram is fully utilised and needs to "upgrade" your ram for improved performance ?
@disqus_gd6d0lw7cB:disqus A good question...not a straightforward thing to determine from Postgres side, maybe only if running with very high shared_buffers (almost all RAM) then the pg_stat_database blks_hit / blks_read can tell that. The 3rd party pg_stat_kcache extension can also help...but otherwise it comes down to using some monitoring tools on the OS side. PS our monitoring tool pgwatch2 has also support for that via Python psutil - see a sample screenshot here - https://raw.githubusercontent.com/cybertec-postgresql/pgwatch2/master/screenshots/system_stats_psutil.png