Tracking changes in PostgreSQL

Hans-Jürgen Schönig

12.2013 | Category: How To | Tags: administration, trigger

12.2013

Category: How To

Tags: administration, trigger

UPDATED 21 March 2023: Tracking database changes and tracing users has always been a vitally important part of PostgreSQL database security and application security. Especially when critical data are stored, it might be interesting to know who has changed which data when and how.

Table of Contents

Generic Changelog Triggers in PostgreSQL are the key to tracking changes

To track the changes made to tables in PostgreSQL you can write yourself a generic changelog trigger. The easiest way to do that is to write a simple PL/pgSQL function and use it for all tables in the system. As PostgreSQL provides good support for stored procedures, this is definitely not hard to do. In this post you will learn how this can be done and the easiest way to achieve your goal.

Create a table to store some history - tracking changes in PostgreSQL databases

First of all, we need a table to store the changes. For a quick prototype, we can use the following table structure:

CREATE SCHEMA logging;
 
CREATE TABLE logging.t_history (
        id             	serial,
        tstamp         	timestamp   	DEFAULT now(),
        schemaname     	text,
        tabname        	text,
        operation      	text,
        who            	text        	DEFAULT current_user,
        new_val        	jsonb,
        old_val        	jsonb
);

1

2

3

4

5

6

7

8

9

10

11

12

CREATE SCHEMA logging;

CREATE TABLE logging.t_history (

id serial,

tstamp timestamp DEFAULT now(),

schemaname text,

tabname text,

operation text,

who text DEFAULT current_user,

new_val jsonb,

old_val jsonb

);

For the sake of simplicity, we didn't use enumerators (enumerators are used to store data in a more efficient way). What is also important: The data we store should be in the most generic possible format. What we did here was to use the jsonb data type.
The point of this table is to keep track of all changes made to other tables. We want to know which operation has taken place. Whenever we insert, update or delete a row somewhere, we want to keep a record of that operation.

The backbone of tracking changes infrastructure in PostgreSQL

Taking this into account, we come up with the following code:

CREATE FUNCTION change_trigger() RETURNS trigger AS $$
BEGIN
	IF  	TG_OP = 'INSERT'
	THEN
		INSERT INTO logging.t_history (tabname, schemaname, operation, new_val)
			VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP, row_to_json(NEW));
			RETURN NEW;
	ELSIF   TG_OP = 'UPDATE'
	THEN
		INSERT INTO logging.t_history (tabname, schemaname, operation, new_val, old_val)
			VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP,
				row_to_json(NEW), row_to_json(OLD));
		RETURN NEW;
	ELSIF   TG_OP = 'DELETE'
	THEN
		INSERT INTO logging.t_history (tabname, schemaname, operation, old_val)
			VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP, row_to_json(OLD));
		RETURN OLD;
	END IF;
END;
$$ LANGUAGE 'plpgsql' SECURITY DEFINER;

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

CREATE FUNCTION change_trigger() RETURNS trigger AS $$

BEGIN

IF TG_OP = 'INSERT'

THEN

INSERT INTO logging.t_history (tabname, schemaname, operation, new_val)

VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP, row_to_json(NEW));

RETURN NEW;

ELSIF TG_OP = 'UPDATE'

THEN

INSERT INTO logging.t_history (tabname, schemaname, operation, new_val, old_val)

VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP,

row_to_json(NEW), row_to_json(OLD));

RETURN NEW;

ELSIF TG_OP = 'DELETE'

THEN

INSERT INTO logging.t_history (tabname, schemaname, operation, old_val)

VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP, row_to_json(OLD));

RETURN OLD;

END IF;

END;

$$ LANGUAGE 'plpgsql' SECURITY DEFINER;

The interesting part here is that the trigger is totally generic. We can use the row_to_json function to encode any table row into a JSON object. The advantage is that we can use the very same trigger for countless tables. NEW and OLD will contain the rows before and after the data modification.
Once we have the backbone in place, we can test things using a table.

Here is an example:

CREATE TABLE t_trig (id int, name text);

1	CREATE TABLE t_trig (id int, name text);

Once we have created our tables, we can deploy triggers to do the real work:

CREATE TRIGGER t 
AFTER INSERT OR UPDATE OR DELETE ON t_trig
        FOR EACH ROW EXECUTE PROCEDURE change_trigger();

1

2

3

CREATE TRIGGER t

AFTER INSERT OR UPDATE OR DELETE ON t_trig

FOR EACH ROW EXECUTE PROCEDURE change_trigger();

What is important here is that we are using an AFTER trigger. We have two choices: BEFORE and AFTER. One trigger fires before the row really makes it into the table. The other one will fire after the row has made it into the table.
The main risk here is that if multiple triggers exist on the same table, we have to make sure that the order of firing is correct: In PostgreSQL (since version 7.3), all BEFORE triggers will fire in alphabetical order (of the trigger name), and then all AFTER triggers will fire in alphabetical order. This has to be taken into account when you have more than one trigger.

Let's test the trigger and see what happens:

INSERT INTO t_trig VALUES (1, 'hans'); 
UPDATE t_trig SET id = 10 * id, name = 'paul';

1 2	INSERT INTO t_trig VALUES (1, 'hans'); UPDATE t_trig SET id = 10 * id, name = 'paul';

Our history table will contain all changes we have made to the underlying tables:

-[ RECORD 1 ]--------------------------
id         | 1
tstamp     | 2023-02-27 10:25:37.530579
schemaname | public
tabname    | t_trig
operation  | INSERT
who        | hs
new_val    | {'id': 1, 'name': 'hans'}
old_val    | 
-[ RECORD 2 ]--------------------------
id         | 2
tstamp     | 2023-02-27 10:25:37.530579
schemaname | public
tabname    | t_trig
operation  | UPDATE
who        | hs
new_val    | {'id': 10, 'name': 'paul'}
old_val    | {'id': 1, 'name': 'hans'}

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

-[ RECORD 1 ]--------------------------

id | 1

tstamp | 2023-02-27 10:25:37.530579

schemaname | public

tabname | t_trig

operation | INSERT

who | hs

new_val | {'id': 1, 'name': 'hans'}

old_val |

-[ RECORD 2 ]--------------------------

id | 2

tstamp | 2023-02-27 10:25:37.530579

schemaname | public

tabname | t_trig

operation | UPDATE

who | hs

new_val | {'id': 10, 'name': 'paul'}

old_val | {'id': 1, 'name': 'hans'}

Security considerations when tracking changes in a PostgreSQL database

If we assume that we use this mechanism to track changes to protect our data, we have to make sure that somebody changing the data cannot change the log as well. Therefore, we have to take precautions for this special case. One way to do this is to mark our trigger function as SECURITY DEFINER. This means that the function itself is not executed as the user making the change to the table, but as the user who has written the function. If the superuser deploys the trigger, we can protect ourselves against evil action.

Finally …

Quite often, tracking changes to the database is an important issue for many people. We hope the code listed above is a blueprint which will help you to get started quickly.

If you want to find out more about PostgreSQL, learn how to write SECURITY DEFINER functions securely - read this blog: /en/abusing-security-definer-functions/

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

22 responses to “Tracking changes in PostgreSQL”

Stephan Hochdörfer says:

December 20, 2013 at 10:46 am

Was there a specific reason why you choose json in favor of hstore? Dimitri Fontaine made a similar post[1] a while back using hstore. What I really like about the hstore approach is the easy way of figuring out the differences between the new_val and old_val.

[1] http://tapoueh.org/blog/2013/08/27-auditing-changes-with-hstore.html

Reply
- Cybertec Schönig & Schönig Gmb says:
  
  December 20, 2013 at 11:16 am
  
  I just used JSON because it is more widely known by people. If I had the choice in production I would most likely go for hstore. I just thought JSON would be interesting to more people. Thank you for the feedback 🙂
  
  Reply
  - Stephan Hochdörfer says:
    
    December 22, 2013 at 7:20 am
    
    Ok that makes sense 😉 I thought you might went down the JSON path because hstore is not part of the core postgres distro and needs to be enabled manually.
    
    Reply
Guest says:

December 20, 2013 at 12:40 pm

One of the reasons why people put off record-level logging is because it used to require a trigger that knows about the columns in the subject table, and the history table has to have all the columns that the subject table has. Hstore has removed that need quite some time ago and with JSON being as popular as it is today it's good that tricks like these are being posted once in a while. This particular one can really save your butt when the users of the database start making naughty claims about values that mysteriously changed in the database...

However, I would probably not log the NEW value, because that is the same value as the current record in the actual table, and not storing it in the history does reduce the size of the table by half, which is not a bad thing if you have lots of updates.

Reply
David Fetter says:

December 20, 2013 at 8:54 pm

It's not that people haven't come up with this idea before. It's that it's pretty useless when you're actually trying to query the cleverly-plowed-together data. I suspect event triggers will make it possible to auto-create audit logging for each table, which while a little less clever would actually help you once you need to query them.

Reply
Radovan says:

December 27, 2013 at 6:50 pm

This approach solve the problem when the table change structure. Tablelog package has that issue.

Reply
Maximilian Tyrtania says:

December 29, 2013 at 3:38 pm

Is there is specific reason why you used a "before"-trigger as opposed to an "after" one? The problem I see with a before trigger is that another before trigger executed later on might decide to cancel the action.

Reply
- Hans-Jürgen Schönig says:
  
  January 7, 2014 at 9:24 am
  
  yes, because it is a plain mistake ;).
  
  Reply
Cristian Pascottini says:

February 12, 2016 at 10:30 am

Great tip, thanks!
I've put all in a SQL script:
https://gist.github.com/cristianp6/29ce1c942448e95c2f95
Any advice or improvement is welcome 😉

Reply
Міша Василюк says:

October 20, 2016 at 8:19 pm

i thank you very much!! your article helped me great. :))))

Reply
Rein Petersen says:

February 9, 2017 at 1:46 pm

how would you go about cycling-out and/or aggregating log data so that your database size doesn't grow beyond manageable size with keeping such logs?

Reply
raguru ramsai says:

September 19, 2018 at 6:23 am

Is any way to monitor IP Addresses also in the same way ??
--as multiple users uses same default user_id

Reply
Wong Kai Wen says:

July 21, 2019 at 7:30 am

Thank you very much! Your sharing helps me a lot!

Reply
ASHu2 says:

October 18, 2019 at 5:05 am

mmmm cool!

Reply
Linjie Zhang says:

May 21, 2020 at 4:12 am

Fantastic article, thank you! If we don't delete any logs, then we needn't to save the old_val, because the old value already been saved. I have an idea that the blockchain maybe help, if we encrypt these logs before upload to the blockchain.

Reply
Tim says:

May 21, 2020 at 6:36 pm

Is it possible to use the log to rollback to the old state of the table? How to do that easily?

Reply
- Hans-Jürgen Schönig says:
  
  May 22, 2020 at 6:26 am
  
  those things are tricky. just consider things suchs as TRUNCATE, ALTER TABLE and so on ...
  
  Reply
  - Tim says:
    
    May 26, 2020 at 2:22 am
    
    What if we only consider 3 operations on the table INSERT, DELETE and UPDATE?
    
    Reply
    - Tim says:
      
      May 27, 2020 at 2:17 pm
      
      For example, if I insert one row in a table:
      
      INSERT INTO table (id, column1, column2) VALUES (1,'value1', 'value2')
      
      Then if I want to roll back later, I need to explicitly provide sql statement:
      
      DELETE FROM table WHERE id=1, column1='value1', column2='value2'
      
      Can this undo statement be generated programmatically?
      
      Reply
Randall Schmidt says:

February 5, 2023 at 6:26 pm

I think you can simplify this a lot by getting rid of the if statement and always doing the same INSERT. If OLD or NEW is null, then ROW_TO_JSON returns null. That's what I'm doing and it works fine unless I'm missing something.

Reply
Georg Klimm says:

April 6, 2023 at 6:08 pm

If I do not want to create a role in the database for each user of the application, it may be useful to start with
set application_name = 'applicationuser@application';
and the definition
who text DEFAULT current_setting('application_name'::text)
to get the user from the application into the database.

Reply

Leave a Reply Cancel reply

Stay tuned with our

NEWSLETTER

CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0

office@cybertec-postgresql.com

ISO_27001_Badge

Customer Support

Support Platform

Services

Support CYBERTEC Partner PostgreSQL Books

Company

STAY TUNED WITH OUR NEWSLETTER

Get the newest PostgreSQL Info & Tools

Data Protection Policy Terms and conditions Terms of Service Imprint

©

2025

CYBERTEC PostgreSQL International GmbH