CYBERTEC PostgreSQL Logo

Constraints over multiple rows in PostgreSQL

06.2021 / Category: / Tags:

Manage constraints over multiple rows: In PostgreSQL and many other relational databases, constraints are an integral part of the feature set. Many people are aware of primary keys, foreign keys, CHECK-constraints, and table constraints. However, from time to time, the situation is way more complicated. That’s when some more advanced techniques are needed to enforce integrity in the way it is desired by end users.
In this post, we will elaborate on restrictions spanning more than one row. What does that mean? Let us envision a simple scenario: We want to store information about airplane ownership. In aviation, it happens more often than not that a single plane has more than one owner. In this example, we want to make sure that the ownership parts actually add up to 100% of the whole. Similar challenges can be found over and over again in real business applications.

Creating sample data to illustrate the use of constraints over multiple rows

Let’s first create some sample data, and see how we can actually solve the problem of ensuring that ownership always has to end up at 100% for a specific airplane:

In this case, we’ve got two tables: The t_plane table contains a unique ID and the call sign of the plane. The call sign is similar to a license plate - it identifies the aircraft. “D” indicates Germany, OE means Austria and “N” would indicate that we are dealing with a US-registered aircraft.

As you can see, we have used a 1 : N relationship here. One plane can have many owners. The trouble is, if we add up the owners’ percentages of a plane, we always have to end up with 100%. The nasty part is concurrency. What if many people make changes at the same time?

Let’s take a look at the basic problem of concurrency: 

User 1

User 2

Comment

returns 200 for D-EHWI

Here you can see that two people are about to modify the same airplane concurrently. There is nothing to stop those two users from actually doing that. The end result is that D-EHWI is owned by two people at the same time - a big no-no.

There are various ways to avoid such concurrency predicaments.

Using LOCK TABLE to protect against concurrency when coding constraints over multiple rows

One of the more commonly chosen methods is to use a table lock. Let’s take a look and see how this works:

User 1

User 2

Comment

in case user 1 is slightly ahead user 2 has to wait

user 2 has to wait ...

returns 100, we are fine

user 2 wakes up

user 2 gets 200

we must rollback because the constraint is violated

In this case, we have used a table lock to solve the problem. The important point here is that PostgreSQL allows us to define 8 different types of locks:

ACCESS SHARE simply means that somebody is reading a row. All it does is to prevent things such as DROP TABLE from happening. However, what we need to do here is to ensure that people cannot insert or modify concurrently. The solution to the problem is lock level EXCLUSIVE. Reading is still possible, but only a single transaction can modify the table. ACCESS EXCLUSIVE would prevent both concurrent reads and writes, but that would be overkill in this case.

What is the problem with this approach? The answer is scalability: In case of a table lock, only ONE transaction can modify the same table at the same time. If 100 airplanes are modified at the same time, 99 transactions have to wait until one transaction is actually able to commit. One could argue that airplane ownership does not change that often. However, this might not be true for other use-cases; we need to keep in mind that scalability does matter. It makes no sense to buy a 32-core server if 31 cores are doomed to idle due to locking.
A more sophisticated solution is needed.

Transaction isolation levels: SERIALIZABLE

One solution to the problem is to use a higher transaction isolation level. Basically, PostgreSQL supports three out of four transaction isolation levels proposed by the ANSI SQL standard:

Currently, READ UNCOMMITTED is mapped to READ COMMITTED, which makes a lot of sense in an MVCC context. However, it’s not enough here.

SERIALIZABLE is the method of choice in this case. What is the basic idea behind SERIAZIABLE? It provides us with the illusion of sequential execution - however, in the background, things are happening with as much parallelism as possible.

Let’s take a closer look at SERIALIZABLE:

User 1

User 2

Comment

start a transaction using the right isolation level

make desired changes

check to make sure that ownership is correct (the app will decide whether to commit or rollback).

user 2 wakes up

user 2 gets 200

we must rollback because the constraint is violated


User 1

User 2

Comment

start a transaction using the right isolation level

make desired changes

check to make sure that ownership is correct (the app will decide whether to commit or rollback).

make conflicting changes violating ownership

the app decided to commit

in this case, we will fail because transactions are not guaranteed to be independent anymore.

We have started to insert data for D-ONUT. The idea is to insert it directly and check at the end of the transaction to see what happens. Note that there is no LOCK TABLE, no SELECT FOR UPDATE or anything of that kind. It is a plain transaction in isolation level SERIALIZABLE. It inserts two owners and then it checks for correctness. The important part is that SERIALIZABLE is actually going to error out:

PostgreSQL noticed that the same data was touched here and that the SELECT statement can surely not return the desired data. The transaction would actually have worked, had we not modified the data. But we did, and therefore the system figured that it cannot uphold the illusion of sequential execution.

There are a couple of questions people frequently ask concerning constraints over multiple rows and SERIALIZABLE:

  • Why not simply use SELECT FOR UPDATE?
  • Why care at all, since a conflict is highly unlikely anyway?

Let’s focus on SELECT FOR UPDATE first: It is true that SELECT FOR UPDATE does lock rows, and that we cannot modify them concurrently (UPDATE, DELETE). But there is a problem: SELECT FOR UPDATE protects against changes made to existing rows. It does not lock “future” rows. In other words: SELECT FOR UPDATE does not prevent other transactions from inserting data, which of course would allow the violation of our constraint (= total ownership has to be NULL or add up to 100%).

The second argument is heard more often than not: It is true that airplane ownership does not change very often. But what we have seen here is that getting things right is really easy and does not require much effort. So why not write proper code to avoid problems in the first place? Managing locking and transaction isolation well is important - not just for consistency but also for scalability. That’s why I recommend you take coding constraints over multiple rows seriously.

Finally …

Transaction isolation is a highly important topic. Another crucial topic is how to know when to use what data type. If you want to find out more about data types in PostgreSQL, we recommend checking out our post on differentiating integer, float, and numeric types. It will give you some insight into what is possible on the data type front.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook, or LinkedIn.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram