Reporting: Creating correct output

08.2013 | Category: How To | Tags: development, sql help

08.2013

Category: How To

Creating reports is a core task of every PostgreSQL database engineer or developer. However, many people think that it is enough to hack up some SQL aggregating some data and execute it. This is not quite true. We have repeatedly seen reports being just plain wrong without people even taking note of it.

Table of Contents

How can this happen?

Transaction isolation does make a difference

In our example we want to aggregate some simple data:

test=# CREATE TABLE t_test (name text, gender char(1));
CREATE TABLE

1 2	test=# CREATE TABLE t_test (name text, gender char(1)); CREATE TABLE

We add two boys and one girl:

test=# INSERT INTO t_test VALUES ('hans', 'm');
INSERT 0 1

test=# INSERT INTO t_test VALUES ('paul', 'm');
INSERT 0 1

test=# INSERT INTO t_test VALUES ('maria', 'f');
INSERT 0 1

1

2

3

4

5

6

7

8

test=# INSERT INTO t_test VALUES ('hans', 'm');

INSERT 0 1

test=# INSERT INTO t_test VALUES ('paul', 'm');

INSERT 0 1

test=# INSERT INTO t_test VALUES ('maria', 'f');

INSERT 0 1

Let us start with our report. We want to create a nice and colorful analysis displaying the number of girls, the number of boys, and the overall total.

To prove our point we have written two queries:

test=# BEGIN;
BEGIN

test=# SELECT gender, count(*) FROM t_test GROUP BY 1;
 gender | count
--------+-------
      m | 2
      f | 1
(2 rows)

test=# SELECT count(*) FROM t_test;
 count
-------
 3
(1 row)

test=# COMMIT;
COMMIT

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

test=# BEGIN;

BEGIN

test=# SELECT gender, count(*) FROM t_test GROUP BY 1;

gender | count

--------+-------

m | 2

f | 1

(2 rows)

test=# SELECT count(*) FROM t_test;

count

-------

3

(1 row)

test=# COMMIT;

COMMIT

We start a transaction, do an analysis per gender followed by an overall count (and yes, there are more efficient ways of doing that).

The question now is: Is this report correct or not? Most people would ask: “Why not?”

User 1

BEGIN;

test=# SELECT gender, count(*)
FROM t_test
GROUP BY 1;
 gender | count
--------+-------
   m    |   2
   f    |   1
(2 rows)
.
.
.

test=# SELECT count(*) FROM t_test;
 count
-------
 2
(1 row)

COMMIT;

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

BEGIN;

test=# SELECT gender, count(*)

FROM t_test

GROUP BY 1;

gender | count

--------+-------

m | 2

f | 1

(2 rows)

.

test=# SELECT count(*) FROM t_test;

count

-------

2

(1 row)

COMMIT;

User 2

.
.
.
.
.
.
.
.
.
.
.
.
test=# DELETE FROM t_test
WHERE name = 'hans';
DELETE 1
.
.
.
.
.
.
.
.
.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

.

test=# DELETE FROM t_test

WHERE name = 'hans';

DELETE 1

.

Here is why:

What we see here is that the count does not reflect what we have seen before. The overall count is two – it is supposed to be three. The reason for this (for many users) unexpected behavior is that we are in so called “read committed” mode. In “read committed” mode every SELECT statement will take a so called snapshot. In other words: Every SELECT will already see data committed by some other transaction. In our case the second SELECT will already see the other transaction's commit. So, over time, a “read committed” transaction can change its view of the data. This implies that a report consisting of more than just one query might not operate on the same data and thus return inconsistent results.

Repeatable read – a way to fix transaction visibility

To fix the problem we can run our transaction in “repeatable read” mode. This will make sure that your transaction will have a consistent view of the data throughout the transaction and therefore return correct results.

Running a transaction in “repeatable read” works like this:

test=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN

test=# SELECT gender, count(*) FROM t_test GROUP BY 1;
 gender | count
--------+------- 
      m | 2
      f | 1
(2 rows)

test=# SELECT count(*) FROM t_test;
 count
-------
 3
(1 row)

test=# COMMIT;
COMMIT

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

test=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

BEGIN

test=# SELECT gender, count(*) FROM t_test GROUP BY 1;

gender | count

--------+-------

m | 2

f | 1

(2 rows)

test=# SELECT count(*) FROM t_test;

count

-------

3

(1 row)

test=# COMMIT;

COMMIT

If you happen to do reporting, we always recommend to use “repeatable read” because it allows you to do your reporting on a consistent view of the data.

----------
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

Comments are closed.