What are the performance differences between normal and generic audit triggers? Recently I was talking in a more general way about some common auditing / change tracking approaches for PostgreSQL...but it also made me curious, how does it look from the performance side?
Table of Contents
To quickly recap the previous blog post: the most common approaches for tracking important changes are mostly solved with writing some triggers. There are two main variations: table specific triggers / audit tables and a more generic approach with only one trigger function and one (or also many) generic audit tables, usually relying on PostgreSQL’s NoSQL features that allow quite a high degree of genericness in quite a convenient and usable way.
Obviously one could “guesstimate” that the generic approach would perform worse than the tailored approach as this is commonly the case in software. But the question is - how much worse? Can it be considered negligible? Well there's only one way to find out I guess….and finally after finding some time to set up a small test case I can say that I was in for a bit of a surprise! But do read on for details or see the last paragraph for the executive summary.
I used pgbench to create auditing tables for the three pgbench tables that receive updates (detailed SQL here). It quickly generated SQL for the background auditing tables and attached triggers. While the schema is simple, I’m only including an excerpt here; the full script is available. I kept the history table populated by pgbench since it has no indexes and shouldn’t impact performance, ensuring both test cases remain equal.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
CREATE TABLE pgbench_accounts_log ( mtime timestamptz not null default now(), action char not null check (action in ('I', 'U', 'D')), username text not null, aid int, bid int, abalance int, filler character(84) /* NB! 84 chars of data stored every time, even when it hasn't changed could be too wasteful / problematic for volume heavy systems so I'd recommend adding some IS DISTINCT FROM filters */ ); CREATE INDEX ON pgbench_accounts_log USING brin (mtime); /* BRIN is perfect for log timestamps that are also not selected too often */ CREATE OR REPLACE FUNCTION public.pgbench_accounts_log() RETURNS trigger LANGUAGE plpgsql AS $function$ BEGIN IF TG_OP = 'DELETE' THEN INSERT INTO pgbench_accounts_log VALUES (now(), 'D', session_user, OLD.*); ELSE INSERT INTO pgbench_accounts_log VALUES (now(), TG_OP::char , session_user, NEW.*); END IF; RETURN NULL; END; $function$; CREATE TRIGGER log_pgbench_accounts AFTER INSERT OR UPDATE OR DELETE ON pgbench_accounts FOR EACH ROW EXECUTE FUNCTION pgbench_accounts_log(); / * same for other 2 tables getting updates */ |
As with the explicit schema, we’re starting with the default pgbench schema but now extending it only with a single trigger function that will be attached to all 3 tables getting updates! And there will also only be a single logging table relying on the “generic” JSONB data type to handle all kinds of various inputs with just one column. Full SQL code below (or here):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
CREATE TABLE pgbench_generic_log ( mtime timestamptz not null default now(), action char not null check (action in ('I', 'U', 'D')), username text not null, table_name text not null, row_data jsonb not null ); CREATE INDEX pgbench_generic_log_table_mtime ON pgbench_generic_log USING brin (mtime); /* Note that with the generic approach we usually need to add more indexes usually, but we also don’t want to index all keys */ CREATE OR REPLACE FUNCTION public.pgbench_generic_log() RETURNS trigger LANGUAGE plpgsql AS $function$ BEGIN IF TG_OP = 'DELETE' THEN INSERT INTO pgbench_generic_log VALUES (now(), 'D', session_user, TG_TABLE_NAME, to_jsonb(OLD)); ELSE INSERT INTO pgbench_generic_log VALUES (now(), TG_OP::char , session_user, TG_TABLE_NAME, to_jsonb(NEW)); END IF; RETURN NULL; END; $function$; CREATE TRIGGER log_pgbench_generic AFTER INSERT OR UPDATE OR DELETE ON pgbench_accounts FOR EACH ROW EXECUTE FUNCTION pgbench_generic_log(); CREATE TRIGGER log_pgbench_generic AFTER INSERT OR UPDATE OR DELETE ON pgbench_branches FOR EACH ROW EXECUTE FUNCTION pgbench_generic_log(); CREATE TRIGGER log_pgbench_generic AFTER INSERT OR UPDATE OR DELETE ON pgbench_tellers FOR EACH ROW EXECUTE FUNCTION pgbench_generic_log(); |
Some hardware / software information for completeness:
Test host: 4 CPU i5-6600, 16GB RAM, SATA SSD
PostgreSQL: v12.1, all defaults except shared_buffers set to 25% of RAM, i.e. 2GB, checkpoint_completion_target = 0.9, backend_flush_after = '2MB' (see here for explanation).
Pgbench: scale 1000, i.e. ~ 13GB DB size to largely factor out disk access (working set should fit more or less into RAM for good performance on any DB system), 4h runtime, 2 concurrent sessions (--jobs) not to cause any excessive locking on the smaller tables. Query latencies were measured directly in the database using the pg_stat_statement extension, so they should be accurate.
Performance has many aspects but let’s start with TPS (Transactions per Second) and transaction latencies (single transaction duration) as they are probably the most important metrics for the performance hungry. Surprisingly it turned out that the generic NoSQL approach was quite a bit faster in total numbers! Whew, I didn’t expect that for sure. Note the “in total numbers” though...as updates on the 2 small tables were still a tiny bit slower still, but the costliest update a.k.a. the “heart” of the transaction (on pgbench_accounts table) was still significantly faster still. See below table for details.
Explicit triggers | Generic triggers | Difference (%) | |
TPS | 585 | 794 | +35.7 |
Total transaction | 6324546 | 8581130 | +35.7 |
Mean time of UPDATE pgbench_accounts (ms) | 1.1127 | 0.8282 | -25.6 |
Mean time of UPDATE pgbench_branches (ms) | 0.0493 | 0.0499 | +1.3 |
Mean time of UPDATE pgbench_tellers (ms) | 0.0479 | 0.0483 | +1.0 |
Well here the decision is very clear when looking at the resulting auditing table sizes after the tests finished - the grand total of tables representing the traditional / explicit approach are much smaller than the one big generic table! So this is definitely also worth considering if you’re building the next Amazon or such. We shouldn’t only look at absolute numbers here as we had 35% more transactions with the generic approach...so we should also factor that in.
Explicit triggers | Generic triggers | Difference (%) | TPS adjusted difference (%) | |
Total size of auditing tables (in MB) | 1578 | 4255 | +170 | +109 |
Note that the table sizes don’t include indexes.
While not directly related to this test, an important aspect of planning change tracking is the need to efficiently access the audit trail. After experimenting with various schemas, the explicit audit table approach clearly outperforms others. It eliminates the need for JSONB syntax, which can be challenging for beginners, and offers better query performance for complex queries as data accumulates. Additionally, the generic approach consumes about twice the space, making it less suitable for frequent historical queries.
Also you will most probably need more targeted functional indexes on the JSONB column as I wouldn’t recommend to just index everything with a single GiN index - although it would be the simplest way to go, it only makes sense if you really perform searches on the majority of all “columns”.
Anyways it’s hard to say something definitive here as real life requirements will differ vastly I imagine, and the amount of indexes / data will be decisive.
My personal learning from this test - it's not over until the fat lady sings 🙂 My initial hunch on the performance cost of generic JSONB storing triggers was not correct this time and the performance is actually great! So one shouldn’t be afraid of this approach due to the performance hit...but if at all, rather due to disk “costs”...or due to possible woes when needing to actually query the gathered data in some complex ways.
By the way - if you’re wondering why the generic approach was actually faster, my best bet is that it has to do with (at least) 3 factors:
1) PostgreSQL JSONB implementation is very-very efficient
2) less PL/pgSQL parsing / planning as same trigger code is used three times
3) better CPU level data caching as we have less active / hot data blocks.
And as always - remember that when choosing between different approaches for your own project I always recommend testing with your exact schema / hardware as schemas vary vastly and a lot depends on data types used. This test was just one of the simpler examples.
After finishing my test I started wondering if it would be somehow possible to reduce the disk “cost” downside of this otherwise neat “generic” approach...and it actually wasn’t too hard to put together some trigger code that looks at the row data and logs only columns and values that are changed!
The core idea can be seen in the code below (full sample here) and also the first short testing looked quite promising actually - about 40% of disk space savings compared to the above described generic approach! So this could be definitely an interesting way to go. Thanks for reading!
1 2 3 4 5 6 |
FOR r IN SELECT * FROM jsonb_each(to_jsonb(NEW)) LOOP IF r.value IS DISTINCT FROM jsonb_extract_path(old_row, r.key) THEN changed_cols := jsonb_set(changed_cols, array[r.key], r.value); END IF; END LOOP; |
Read more about PostgreSQL, triggers and performance, see this blog: Why Are My PostgreSQL Updates Getting Slower? By Laurenz Albe.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Looks so the pattern var.* in plpgsql has significant overhead - in my tests about 6-10%
Ok interesting. You mean ~10% for a single usage of such construct vs explicit column list or end result for a similar test? But seems this amount of penalty should still leave the JSONB approach on top still, even when taking more care when writing the normal triggers.
yes, there is a overhead of plpgsql with passing parameters to INSERT, and there is some CPU overhead related to number of columns. If your bottleneck is CPU, then is better to have less columns than more.
Interesting post. Thanks for it.
Just a tiny glitch: "Test host: 4 CPU i5-6600, 16GB RAM... shared_buffers set to 25% of RAM, i.e. 2GB" => 4GB shared buffers or test host with 8GB of RAM?
Ah good catch, shared_buffers was 4GB indeed