I wrote a "pg_dump compression specifications in PostgreSQL 16" post a while ago. Frankly speaking, I thought new compression methods would not be implemented in PostgreSQL until 2-3 years from now. Probably demand is so high that LZ4 and ZSTD made their way into PostgreSQL 16!
Table of Contents
The LZ4 patch author is Georgios Kokolatos. Committed by Tomas Vondra. Reviewed by Michael Paquier, Rachel Heaton, Justin Pryzby, Shi Yu, and Tomas Vondra. The commit message is:
1 2 3 4 5 6 7 8 9 |
Expand pg_dump's compression streaming and file APIs to support the lz4 algorithm. The newly added compress_lz4.{c,h} files cover all the functionality of the aforementioned APIs. Minor changes were necessary in various pg_backup_* files, where code for the 'lz4' file suffix has been added, as well as pg_dump's compression option parsing. Author: Georgios Kokolatos Reviewed-by: Michael Paquier, Rachel Heaton, Justin Pryzby, Shi Yu, Tomas Vondra Discussion: https://postgr.es/m/faUNEOpts9vunEaLnmxmG-DldLSg_ql137OC3JYDmgrOMHm1RvvWY2IdBkv_CRxm5spCCb_OmKNk2T03TMm0fBEWveFF9wA1WizPuAgB7Ss%3D%40protonmail.com |
The ZSTD patch author is Justin Pryzby. Committed by Tomas Vondra. Reviewed by Tomas Vondra, Jacob Champion, and Andreas Karlsson. The commit message is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
Allow pg_dump to use the zstd compression, in addition to gzip/lz4. Bulk of the new compression method is implemented in compress_zstd.{c,h}, covering the pg_dump compression APIs. The rest of the patch adds test and makes various places aware of the new compression method. The zstd library (which this patch relies on) supports multithreaded compression since version 1.5. We however disallow that feature for now, as it might interfere with parallel backups on platforms that rely on threads (e.g. Windows). This can be improved / relaxed in the future. This also fixes a minor issue in InitDiscoverCompressFileHandle(), which was not updated to check if the file already has the .lz4 extension. Adding zstd compression was originally proposed in 2020 (see the second thread), but then was reworked to use the new compression API introduced in e9960732a9. I've considered both threads when compiling the list of reviewers. Author: Justin Pryzby Reviewed-by: Tomas Vondra, Jacob Champion, Andreas Karlsson Discussion: https://postgr.es/m/20230224191840.GD1653@telsasoft.com Discussion: https://postgr.es/m/20201221194924.GI30237@telsasoft.com |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
~$ pg_dump --version pg_dump (PostgreSQL) 16devel ~$ pgbench --initialize --scale=100 dropping old tables... NOTICE: table 'pgbench_accounts' does not exist, skipping NOTICE: table 'pgbench_branches' does not exist, skipping NOTICE: table 'pgbench_history' does not exist, skipping NOTICE: table 'pgbench_tellers' does not exist, skipping creating tables... generating data (client-side)... 10000000 of 10000000 tuples (100%) done (elapsed 39.52 s, remaining 0.00 s) vacuuming... creating primary keys... done in 49.65 s (drop tables 0.00 s, create tables 0.08 s, client-side generate 39.96 s, vacuum 0.29 s, primary keys 9.32 s). ~$ psql --command='select pg_size_pretty(pg_database_size('postgres'))' pg_size_pretty ---------------- 1503 MB (1 row) ~$ time pg_dump --format=custom --compress=lz4:9 > dump.lz4 real 0m10.507s user 0m9.901s sys 0m0.436s ~$ time pg_dump --format=custom --compress=zstd:9 > dump.zstd real 0m8.794s user 0m8.393s sys 0m0.364s ~$ time pg_dump --format=custom --compress=gzip:9 > dump.gz real 0m14.245s user 0m13.064s sys 0m0.978s ~$ time pg_dump --format=custom --compress=lz4 > dump_default.lz4 real 0m6.809s user 0m1.666s sys 0m1.125s ~$ time pg_dump --format=custom --compress=zstd > dump_default.zstd real 0m7.534s user 0m2.428s sys 0m0.892s ~$ time pg_dump --format=custom --compress=gzip > dump_default.gz real 0m11.564s user 0m10.661s sys 0m0.525s ~$ time pg_dump --format=custom --compress=lz4:3 > dump_3.lz4 real 0m8.497s user 0m7.856s sys 0m0.507s ~$ time pg_dump --format=custom --compress=zstd:3 > dump_3.zstd real 0m5.129s user 0m2.228s sys 0m0.726s ~$ time pg_dump --format=custom --compress=gzip:3 > dump_3.gz real 0m4.468s user 0m3.654s sys 0m0.504s ~$ ls -l --block-size=M total 250M -rw-rw-r-- 1 postgres postgres 28M Apr 18 13:58 dump_3.gz -rw-rw-r-- 1 postgres postgres 48M Apr 18 13:57 dump_3.lz4 -rw-rw-r-- 1 postgres postgres 8M Apr 18 13:58 dump_3.zstd -rw-rw-r-- 1 postgres postgres 27M Apr 18 13:57 dump_default.gz -rw-rw-r-- 1 postgres postgres 50M Apr 18 13:56 dump_default.lz4 -rw-rw-r-- 1 postgres postgres 8M Apr 18 13:57 dump_default.zstd -rw-rw-r-- 1 postgres postgres 27M Apr 18 13:56 dump.gz -rw-rw-r-- 1 postgres postgres 48M Apr 18 13:55 dump.lz4 -rw-rw-r-- 1 postgres postgres 8M Apr 18 13:56 dump.zstd |
Based on the output of the commands, we can conclude the following about the three compression methods:
The big surprise to me is that zstd takes the least amount of time for compression, followed by lz4 and gzip. This data probably is not the best to produce measurements and comparisons. However, that's a topic for another blog post. At the default compression level, zstd produces the smallest dump file size, followed by lz4 and gzip. At the maximum compression level, zstd still produces the smallest dump file size, followed by gzip and lz4.
Based on these observations, if your priority is to reduce disk space usage, zstd is the recommended compression method. However, if your priority is to minimize compression time, zstd and lz4 both perform well. If compatibility with other utilities is a concern, gzip remains a viable option.
pg_dump's -Z/--compress
in PostgreSQL 16 will support more than just an integer. It can be used to specify the method and level of compression used. The default is still gzip with a level of 6. But the new kids on the block, lz4 and zstd, are already here!
That said, pg_dump is sometimes used to update and/or upgrade the database. In case you want to understand the difference between an update and an upgrade, check out this blog post by Hans-Jürgen Schönig. Or check out our other related publications about updating and upgrading.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Hello, because zstd allows better performance doing a restore perhaps including writing a file with -f, using -j [parallel restore] will show the real capability and speed to restore a database using zstd.