Manage constraints over multiple rows: In PostgreSQL and many other relational databases, constraints are an integral part of the feature set. Many people are aware of primary keys, foreign keys, CHECK-constraints, and table constraints. However, from time to time, the situation is way more complicated. That’s when some more advanced techniques are needed to enforce integrity in the way it is desired by end users.
In this post, we will elaborate on restrictions spanning more than one row. What does that mean? Let us envision a simple scenario: We want to store information about airplane ownership. In aviation, it happens more often than not that a single plane has more than one owner. In this example, we want to make sure that the ownership parts actually add up to 100% of the whole. Similar challenges can be found over and over again in real business applications.
Table of Contents
Let’s first create some sample data, and see how we can actually solve the problem of ensuring that ownership always has to end up at 100% for a specific airplane:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
CREATE TABLE t_plane ( id int UNIQUE, call_sign text NOT NULL UNIQUE ); CREATE TABLE t_owner ( plane_id int REFERENCES t_plane (id) INITIALLY DEFERRED, owner text, fraction numeric ); INSERT INTO t_plane (id, call_sign) VALUES (1, 'D-EHWI'), (2, 'ES-TEEM'), (3, 'D-ONUT'); |
In this case, we’ve got two tables: The t_plane
table contains a unique ID and the call sign of the plane. The call sign is similar to a license plate - it identifies the aircraft. “D” indicates Germany, OE means Austria and “N” would indicate that we are dealing with a US-registered aircraft.
As you can see, we have used a 1 : N relationship here. One plane can have many owners. The trouble is, if we add up the owners’ percentages of a plane, we always have to end up with 100%. The nasty part is concurrency. What if many people make changes at the same time?
Let’s take a look at the basic problem of concurrency:
User 1
User 2
Comment
1 |
BEGIN; |
1 |
BEGIN; |
1 2 3 |
INSERT INTO t_owner VALUES (1, 'Hans', 100); |
1 2 3 |
INSERT INTO t_owner VALUES (1, 'Joe', 100); |
1 |
COMMIT; |
1 |
COMMIT; |
1 2 3 4 5 |
SELECT sum(fraction) FROM t_owner WHERE plane_id = 1; |
returns 200 for D-EHWI
Here you can see that two people are about to modify the same airplane concurrently. There is nothing to stop those two users from actually doing that. The end result is that D-EHWI is owned by two people at the same time - a big no-no.
There are various ways to avoid such concurrency predicaments.
One of the more commonly chosen methods is to use a table lock. Let’s take a look and see how this works:
User 1
User 2
Comment
1 |
BEGIN; |
1 |
BEGIN; |
1 2 |
LOCK TABLE t_owner IN EXCLUSIVE MODE; |
1 2 |
LOCK TABLE t_owner IN EXCLUSIVE MODE; |
in case user 1 is slightly ahead user 2 has to wait
1 2 3 |
INSERT INTO t_owner VALUES (1, 'Hans', 100); |
1 |
... |
user 2 has to wait ...
1 2 3 4 5 |
SELECT sum(fraction) FROM t_owner WHERE plane_id = 1; |
1 |
... |
returns 100, we are fine
1 |
COMMIT; |
1 |
... |
user 2 wakes up
1 2 3 |
INSERT INTO t_owner VALUES (1, 'Hans', 100); |
1 2 3 4 5 |
SELECT sum(fraction) FROM t_owner WHERE plane_id = 1; |
user 2 gets 200
1 |
ROLLBACK; |
we must rollback because the constraint is violated
In this case, we have used a table lock to solve the problem. The important point here is that PostgreSQL allows us to define 8 different types of locks:
1 2 3 4 5 6 7 8 9 10 |
test=# h LOCK Command: LOCK Description: lock a table Syntax: LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ] where lockmode is one of: ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE |
ACCESS SHARE
simply means that somebody is reading a row. All it does is to prevent things such as DROP TABLE
from happening. However, what we need to do here is to ensure that people cannot insert or modify concurrently. The solution to the problem is lock level EXCLUSIVE
. Reading is still possible, but only a single transaction can modify the table. ACCESS EXCLUSIVE
would prevent both concurrent reads and writes, but that would be overkill in this case.
What is the problem with this approach? The answer is scalability: In case of a table lock, only ONE transaction can modify the same table at the same time. If 100 airplanes are modified at the same time, 99 transactions have to wait until one transaction is actually able to commit. One could argue that airplane ownership does not change that often. However, this might not be true for other use-cases; we need to keep in mind that scalability does matter. It makes no sense to buy a 32-core server if 31 cores are doomed to idle due to locking.
A more sophisticated solution is needed.
One solution to the problem is to use a higher transaction isolation level. Basically, PostgreSQL supports three out of four transaction isolation levels proposed by the ANSI SQL standard:
1 2 3 4 5 6 7 8 9 10 11 12 |
test=# h BEGIN Command: BEGIN Description: start a transaction block Syntax: BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] where transaction_mode is one of: ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED } READ WRITE | READ ONLY [ NOT ] DEFERRABLE |
Currently, READ UNCOMMITTED
is mapped to READ COMMITTED
, which makes a lot of sense in an MVCC context. However, it’s not enough here.
SERIALIZABLE
is the method of choice in this case. What is the basic idea behind SERIAZIABLE
? It provides us with the illusion of sequential execution - however, in the background, things are happening with as much parallelism as possible.
Let’s take a closer look at SERIALIZABLE:
User 1
User 2
Comment
1 2 |
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; |
start a transaction using the right isolation level
1 2 3 |
INSERT INTO t_owner VALUES (3, 'Hans', 60); |
make desired changes
1 2 3 |
INSERT INTO t_owner VALUES (3, 'Paul', 40); |
1 2 3 4 5 |
SELECT sum(fraction) FROM t_owner WHERE plane_id = 3; |
1 2 |
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; |
check to make sure that ownership is correct (the app will decide whether to commit or rollback).
1 |
COMMIT; |
1 |
... |
user 2 wakes up
1 2 3 |
INSERT INTO t_owner VALUES (1, 'Hans', 100); |
1 2 3 4 5 |
SELECT sum(fraction) FROM t_owner WHERE plane_id = 1; |
user 2 gets 200
1 |
ROLLBACK; |
we must rollback because the constraint is violated
User 1
User 2
Comment
1 2 |
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; |
start a transaction using the right isolation level
1 2 |
INSERT INTO t_owner VALUES (3, 'Hans', 60); |
make desired changes
1 2 |
INSERT INTO t_owner VALUES (3, 'Paul', 40); |
1 2 3 4 5 6 7 |
SELECT sum(fraction) FROM t_owner WHERE plane_id = 3; ----- (1 row) |
1 2 |
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; |
check to make sure that ownership is correct (the app will decide whether to commit or rollback).
1 2 |
INSERT INTO t_owner VALUES (3, 'Hans', 60); |
make conflicting changes violating ownership
1 2 |
INSERT INTO t_owner VALUES (3, 'Paul', 40); |
1 |
COMMIT; |
1 |
... |
the app decided to commit
1 2 3 4 |
COMMIT; ERROR: could not serialize access ... |
in this case, we will fail because transactions are not guaranteed to be independent anymore.
We have started to insert data for D-ONUT. The idea is to insert it directly and check at the end of the transaction to see what happens. Note that there is no LOCK TABLE
, no SELECT FOR UPDATE
or anything of that kind. It is a plain transaction in isolation level SERIALIZABLE
. It inserts two owners and then it checks for correctness. The important part is that SERIALIZABLE
is actually going to error out:
1 2 3 4 5 |
ERROR: could not serialize access due to read/write dependencies among transactions DETAIL: Reason code: Canceled on identification as a pivot, during conflict out checking. HINT: The transaction might succeed if retried. CONTEXT: SQL statement 'SELECT 1 FROM ONLY 'public'.'t_plane' x WHERE 'id' OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x' |
PostgreSQL noticed that the same data was touched here and that the SELECT
statement can surely not return the desired data. The transaction would actually have worked, had we not modified the data. But we did, and therefore the system figured that it cannot uphold the illusion of sequential execution.
There are a couple of questions people frequently ask concerning constraints over multiple rows and SERIALIZABLE:
SELECT FOR UPDATE?
Let’s focus on SELECT FOR UPDATE
first: It is true that SELECT FOR UPDATE
does lock rows, and that we cannot modify them concurrently (UPDATE, DELETE
). But there is a problem: SELECT FOR UPDATE
protects against changes made to existing rows. It does not lock “future” rows. In other words: SELECT FOR UPDATE
does not prevent other transactions from inserting data, which of course would allow the violation of our constraint (= total ownership has to be NULL or add up to 100%).
The second argument is heard more often than not: It is true that airplane ownership does not change very often. But what we have seen here is that getting things right is really easy and does not require much effort. So why not write proper code to avoid problems in the first place? Managing locking and transaction isolation well is important - not just for consistency but also for scalability. That’s why I recommend you take coding constraints over multiple rows seriously.
Transaction isolation is a highly important topic. Another crucial topic is how to know when to use what data type. If you want to find out more about data types in PostgreSQL, we recommend checking out our post on differentiating integer, float, and numeric types. It will give you some insight into what is possible on the data type front.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook, or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply