Constraints over multiple rows in PostgreSQL

Manage constraints over multiple rows: In PostgreSQL and many other relational databases, constraints are an integral part of the feature set. Many people are aware of primary keys, foreign keys, CHECK-constraints, and table constraints. However, from time to time, the situation is way more complicated. That’s when some more advanced techniques are needed to enforce integrity in the way it is desired by end users.
In this post, we will elaborate on restrictions spanning more than one row. What does that mean? Let us envision a simple scenario: We want to store information about airplane ownership. In aviation, it happens more often than not that a single plane has more than one owner. In this example, we want to make sure that the ownership parts actually add up to 100% of the whole. Similar challenges can be found over and over again in real business applications.

Table of Contents

Creating sample data to illustrate the use of constraints over multiple rows

Let’s first create some sample data, and see how we can actually solve the problem of ensuring that ownership always has to end up at 100% for a specific airplane:

CREATE TABLE t_plane
(
    id           int     UNIQUE,
    call_sign    text    NOT NULL UNIQUE
);


CREATE TABLE t_owner
(
    plane_id     int    REFERENCES    t_plane (id)
                                      INITIALLY DEFERRED,
    owner        text,
    fraction     numeric
);


INSERT INTO t_plane (id, call_sign)
VALUES     (1, 'D-EHWI'),
           (2, 'ES-TEEM'),
           (3, 'D-ONUT');

CREATE TABLE t_plane

(

id int UNIQUE,

call_sign text NOT NULL UNIQUE

);

CREATE TABLE t_owner

(

plane_id int REFERENCES t_plane (id)

INITIALLY DEFERRED,

owner text,

fraction numeric

);

INSERT INTO t_plane (id, call_sign)

VALUES (1, 'D-EHWI'),

(2, 'ES-TEEM'),

(3, 'D-ONUT');

In this case, we’ve got two tables: The t_plane table contains a unique ID and the call sign of the plane. The call sign is similar to a license plate - it identifies the aircraft. “D” indicates Germany, OE means Austria and “N” would indicate that we are dealing with a US-registered aircraft.

As you can see, we have used a 1 : N relationship here. One plane can have many owners. The trouble is, if we add up the owners’ percentages of a plane, we always have to end up with 100%. The nasty part is concurrency. What if many people make changes at the same time?

Let’s take a look at the basic problem of concurrency:

User 1

User 2

Comment

BEGIN;

BEGIN;

BEGIN;

BEGIN;

INSERT INTO t_owner 

VALUES (1, 'Hans', 100);

INSERT INTO t_owner

VALUES (1, 'Hans', 100);

INSERT INTO t_owner 

VALUES (1, 'Joe', 100);

INSERT INTO t_owner

VALUES (1, 'Joe', 100);

COMMIT;

COMMIT;

COMMIT;

COMMIT;

SELECT   sum(fraction) 

FROM     t_owner 

WHERE    plane_id = 1;

SELECT sum(fraction)

FROM t_owner

WHERE plane_id = 1;

returns 200 for D-EHWI

Here you can see that two people are about to modify the same airplane concurrently. There is nothing to stop those two users from actually doing that. The end result is that D-EHWI is owned by two people at the same time - a big no-no.

There are various ways to avoid such concurrency predicaments.

Using LOCK TABLE to protect against concurrency when coding constraints over multiple rows

One of the more commonly chosen methods is to use a table lock. Let’s take a look and see how this works:

User 1

User 2

Comment

BEGIN;

BEGIN;

BEGIN;

BEGIN;

LOCK TABLE t_owner 
IN EXCLUSIVE MODE;

1 2	LOCK TABLE t_owner IN EXCLUSIVE MODE;

LOCK TABLE t_owner 
IN EXCLUSIVE MODE;

1 2	LOCK TABLE t_owner IN EXCLUSIVE MODE;

in case user 1 is slightly ahead user 2 has to wait

INSERT INTO t_owner 

VALUES (1, 'Hans', 100);

INSERT INTO t_owner

VALUES (1, 'Hans', 100);

...

...

user 2 has to wait ...

SELECT   sum(fraction) 

FROM     t_owner 

WHERE    plane_id = 1;

SELECT sum(fraction)

FROM t_owner

WHERE plane_id = 1;

...

...

returns 100, we are fine

COMMIT;

COMMIT;

...

...

user 2 wakes up

INSERT INTO t_owner 

VALUES (1, 'Hans', 100);

INSERT INTO t_owner

VALUES (1, 'Hans', 100);

SELECT   sum(fraction) 

FROM     t_owner 

WHERE    plane_id = 1;

SELECT sum(fraction)

FROM t_owner

WHERE plane_id = 1;

user 2 gets 200

ROLLBACK;

ROLLBACK;

we must rollback because the constraint is violated

In this case, we have used a table lock to solve the problem. The important point here is that PostgreSQL allows us to define 8 different types of locks:

test=# h LOCK
Command: LOCK
Description: lock a table
Syntax:
LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]

where lockmode is one of:

   ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE
   | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE

test=# h LOCK

Command: LOCK

Description: lock a table

Syntax:

LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]

where lockmode is one of:

ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE

| SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE

ACCESS SHARE simply means that somebody is reading a row. All it does is to prevent things such as DROP TABLE from happening. However, what we need to do here is to ensure that people cannot insert or modify concurrently. The solution to the problem is lock level EXCLUSIVE. Reading is still possible, but only a single transaction can modify the table. ACCESS EXCLUSIVE would prevent both concurrent reads and writes, but that would be overkill in this case.

What is the problem with this approach? The answer is scalability: In case of a table lock, only ONE transaction can modify the same table at the same time. If 100 airplanes are modified at the same time, 99 transactions have to wait until one transaction is actually able to commit. One could argue that airplane ownership does not change that often. However, this might not be true for other use-cases; we need to keep in mind that scalability does matter. It makes no sense to buy a 32-core server if 31 cores are doomed to idle due to locking.
A more sophisticated solution is needed.

Transaction isolation levels: SERIALIZABLE

One solution to the problem is to use a higher transaction isolation level. Basically, PostgreSQL supports three out of four transaction isolation levels proposed by the ANSI SQL standard:

test=# h BEGIN
Command: BEGIN
Description: start a transaction block
Syntax:
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]

where transaction_mode is one of:

    ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ
         | READ COMMITTED | READ UNCOMMITTED }
    READ WRITE | READ ONLY
    [ NOT ] DEFERRABLE

test=# h BEGIN

Command: BEGIN

Description: start a transaction block

Syntax:

BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]

where transaction_mode is one of:

ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ

| READ COMMITTED | READ UNCOMMITTED }

READ WRITE | READ ONLY

[ NOT ] DEFERRABLE

Currently, READ UNCOMMITTED is mapped to READ COMMITTED, which makes a lot of sense in an MVCC context. However, it’s not enough here.

SERIALIZABLE is the method of choice in this case. What is the basic idea behind SERIAZIABLE? It provides us with the illusion of sequential execution - however, in the background, things are happening with as much parallelism as possible.

Let’s take a closer look at SERIALIZABLE:

User 1

User 2

Comment

BEGIN TRANSACTION ISOLATION
LEVEL SERIALIZABLE;

1 2	BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

start a transaction using the right isolation level

INSERT INTO t_owner 

VALUES (3, 'Hans', 60);

INSERT INTO t_owner

VALUES (3, 'Hans', 60);

make desired changes

INSERT INTO t_owner 

VALUES (3, 'Paul', 40);

INSERT INTO t_owner

VALUES (3, 'Paul', 40);

SELECT   sum(fraction) 

FROM     t_owner 

WHERE    plane_id = 3;

SELECT sum(fraction)

FROM t_owner

WHERE plane_id = 3;

BEGIN TRANSACTION ISOLATION
LEVEL SERIALIZABLE;

1 2	BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

check to make sure that ownership is correct (the app will decide whether to commit or rollback).

COMMIT;

COMMIT;

...

...

user 2 wakes up

INSERT INTO t_owner 

VALUES (1, 'Hans', 100);

INSERT INTO t_owner

VALUES (1, 'Hans', 100);

SELECT   sum(fraction) 

FROM     t_owner 

WHERE    plane_id = 1;

SELECT sum(fraction)

FROM t_owner

WHERE plane_id = 1;

user 2 gets 200

ROLLBACK;

ROLLBACK;

we must rollback because the constraint is violated

User 1

User 2

Comment

BEGIN TRANSACTION ISOLATION 
LEVEL SERIALIZABLE;

1 2	BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

start a transaction using the right isolation level

INSERT INTO t_owner 
VALUES (3, 'Hans', 60);

1 2	INSERT INTO t_owner VALUES (3, 'Hans', 60);

make desired changes

INSERT INTO t_owner 
VALUES (3, 'Paul', 40);

1 2	INSERT INTO t_owner VALUES (3, 'Paul', 40);

SELECT sum(fraction) 
FROM   t_owner 
WHERE  plane_id = 3;

-----

(1 row)

SELECT sum(fraction)

FROM t_owner

WHERE plane_id = 3;

-----

(1 row)

BEGIN TRANSACTION ISOLATION 
LEVEL SERIALIZABLE;

1 2	BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

check to make sure that ownership is correct (the app will decide whether to commit or rollback).

INSERT INTO t_owner 
VALUES (3, 'Hans', 60);

1 2	INSERT INTO t_owner VALUES (3, 'Hans', 60);

make conflicting changes violating ownership

INSERT INTO t_owner 
VALUES (3, 'Paul', 40);

1 2	INSERT INTO t_owner VALUES (3, 'Paul', 40);

COMMIT;

COMMIT;

...

...

the app decided to commit

COMMIT;

ERROR: could not 
serialize access ...

COMMIT;

ERROR: could not

serialize access ...

in this case, we will fail because transactions are not guaranteed to be independent anymore.

We have started to insert data for D-ONUT. The idea is to insert it directly and check at the end of the transaction to see what happens. Note that there is no LOCK TABLE, no SELECT FOR UPDATE or anything of that kind. It is a plain transaction in isolation level SERIALIZABLE. It inserts two owners and then it checks for correctness. The important part is that SERIALIZABLE is actually going to error out:

ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during conflict out checking.
HINT: The transaction might succeed if retried.
CONTEXT: SQL statement 'SELECT 1 FROM ONLY 'public'.'t_plane' x WHERE 'id' 
OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x'

ERROR: could not serialize access due to read/write dependencies among transactions

DETAIL: Reason code: Canceled on identification as a pivot, during conflict out checking.

HINT: The transaction might succeed if retried.

CONTEXT: SQL statement 'SELECT 1 FROM ONLY 'public'.'t_plane' x WHERE 'id'

OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x'

PostgreSQL noticed that the same data was touched here and that the SELECT statement can surely not return the desired data. The transaction would actually have worked, had we not modified the data. But we did, and therefore the system figured that it cannot uphold the illusion of sequential execution.

There are a couple of questions people frequently ask concerning constraints over multiple rows and SERIALIZABLE:

Why not simply use SELECT FOR UPDATE?
Why care at all, since a conflict is highly unlikely anyway?

Let’s focus on SELECT FOR UPDATE first: It is true that SELECT FOR UPDATE does lock rows, and that we cannot modify them concurrently (UPDATE, DELETE). But there is a problem: SELECT FOR UPDATE protects against changes made to existing rows. It does not lock “future” rows. In other words: SELECT FOR UPDATE does not prevent other transactions from inserting data, which of course would allow the violation of our constraint (= total ownership has to be NULL or add up to 100%).

The second argument is heard more often than not: It is true that airplane ownership does not change very often. But what we have seen here is that getting things right is really easy and does not require much effort. So why not write proper code to avoid problems in the first place? Managing locking and transaction isolation well is important - not just for consistency but also for scalability. That’s why I recommend you take coding constraints over multiple rows seriously.

Finally …

Transaction isolation is a highly important topic. Another crucial topic is how to know when to use what data type. If you want to find out more about data types in PostgreSQL, we recommend checking out our post on differentiating integer, float, and numeric types. It will give you some insight into what is possible on the data type front.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook, or LinkedIn.

Constraints over multiple rows in PostgreSQL

Creating sample data to illustrate the use of constraints over multiple rows

Using LOCK TABLE to protect against concurrency when coding constraints over multiple rows

Transaction isolation levels: SERIALIZABLE

Finally …

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts