Creating reports is a core task of every PostgreSQL database engineer or developer. However, many people think that it is enough to hack up some SQL aggregating some data and execute it. This is not quite true. We have repeatedly seen reports being just plain wrong without people even taking note of it.
Table of Contents
How can this happen?
In our example we want to aggregate some simple data:
1 2 |
test=# CREATE TABLE t_test (name text, gender char(1)); CREATE TABLE |
We add two boys and one girl:
1 2 3 4 5 6 7 8 |
test=# INSERT INTO t_test VALUES ('hans', 'm'); INSERT 0 1 test=# INSERT INTO t_test VALUES ('paul', 'm'); INSERT 0 1 test=# INSERT INTO t_test VALUES ('maria', 'f'); INSERT 0 1 |
Let us start with our report. We want to create a nice and colorful analysis displaying the number of girls, the number of boys, and the overall total.
To prove our point we have written two queries:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
test=# BEGIN; BEGIN test=# SELECT gender, count(*) FROM t_test GROUP BY 1; gender | count --------+------- m | 2 f | 1 (2 rows) test=# SELECT count(*) FROM t_test; count ------- 3 (1 row) test=# COMMIT; COMMIT |
We start a transaction, do an analysis per gender followed by an overall count (and yes, there are more efficient ways of doing that).
The question now is: Is this report correct or not? Most people would ask: “Why not?”
User 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
BEGIN; test=# SELECT gender, count(*) FROM t_test GROUP BY 1; gender | count --------+------- m | 2 f | 1 (2 rows) . . . test=# SELECT count(*) FROM t_test; count ------- 2 (1 row) COMMIT; |
User 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
. . . . . . . . . . . . test=# DELETE FROM t_test WHERE name = 'hans'; DELETE 1 . . . . . . . . . |
Here is why:
What we see here is that the count does not reflect what we have seen before. The overall count is two – it is supposed to be three. The reason for this (for many users) unexpected behavior is that we are in so called “read committed” mode. In “read committed” mode every SELECT statement will take a so called snapshot. In other words: Every SELECT will already see data committed by some other transaction. In our case the second SELECT will already see the other transaction's commit. So, over time, a “read committed” transaction can change its view of the data. This implies that a report consisting of more than just one query might not operate on the same data and thus return inconsistent results.
To fix the problem we can run our transaction in “repeatable read” mode. This will make sure that your transaction will have a consistent view of the data throughout the transaction and therefore return correct results.
Running a transaction in “repeatable read” works like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
test=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; BEGIN test=# SELECT gender, count(*) FROM t_test GROUP BY 1; gender | count --------+------- m | 2 f | 1 (2 rows) test=# SELECT count(*) FROM t_test; count ------- 3 (1 row) test=# COMMIT; COMMIT |
If you happen to do reporting, we always recommend to use “repeatable read” because it allows you to do your reporting on a consistent view of the data.
----------
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information