CYBERTEC PostgreSQL Logo

Breaking your PostgreSQL database with bad CHECK constraints

03.2023 / Category: / Tags: | |
A father finds out that CHECK constraints won't work on children - they eloped to neighbor's garden
© Laurenz Albe 2023

I am on a spree! After writing about breaking your database and transaction ID wraparound, here I am writing about damage caused by CHECK constraints! Trust me, I don't want to destroy PostgreSQL databases. It's just that this Twitter message was brought to my attention and triggered memories of other reports about CHECK constraints that caused problems.

When is a CHECK constraint correct?

The documentation warns that there are rules you have to play by when using CHECK constraints:

PostgreSQL assumes that CHECK constraints' conditions are immutable, that is, they will always give the same result for the same input row. This assumption is what justifies examining CHECK constraints only when rows are inserted or updated, and not at other times.

That makes sense. Otherwise, PostgreSQL would have to check the condition whenever anything in the database changes (or when time has passed), which is not an option. However, PostgreSQL does not enforce that the constrained expression be IMMUTABLE.

Why is IMMUTABLE not enforced in CHECK constraint expressions?

PostgreSQL normally doesn't leave requirements like that to the discretion of the user. For example, trying to use a function that is not IMMUTABLE in an index definition causes the error:

So why does PostgreSQL not enforce that for CHECK constraints? The reason is that there are valid use cases for CHECK constraints where the expression is not IMMUTABLE in the literal sense (but the constraint is). This is even required by the SQL standard. To quote ISO/IEC 9075-2:2003, chapter 11, verse 9:

<check constraint definition> ::= CHECK <left paren> <search condition> <right paren>

5) The <search condition> shall simply contain a <boolean value expression> that is retrospectively deterministic.

Earlier on, the standard explains:

A retrospectively deterministic <boolean value expression> has the property that if it is True at one point time, then it is True for all later points in time if re-evaluated for the identical SQL-data by an arbitrary user with the identical set of privileges. The precise definition is found in Subclause 6.34, “<boolean value expression>”.

I'll spare you the “precise” definition of “retrospectively deterministic”. It is enough to bring tears to your eyes and is a prime example of how an attempt to formalize an idea can go wrong. What you can distill from it is that the standard committee had something like this in mind:

This is not an IMMUTABLE expression (current_timestamp can have a different value tomorrow), but certainly a reasonable CHECK constraint. Any row that satisfies that condition now will also satisfy it in the future.

Breaking valid UPDATEs with a bad CHECK constraint

A bad constraint to check the format of a string

Imagine we have a table

and we have a table to commemorate deleted rows

that is populated by a trigger

We want to create a CHECK constraint that enforces that the string has the correct format. So we define

Demonstrating the problem

Let's add and delete some data:

Sure enough, there is now a row in history:

At some later time, we need to create a new column for our table:

Next time we look at the history table and want to increase view_count, we get:

What went wrong, and what should we have done instead?

What happened? By adding a column to the table data, we also changed the composite data type of the same name. That way, we broke “retrospective determinism”, because the data no longer match the changed data type. Even though we didn't change the checked value at all, the new row no longer satisfies the CHECK constraint.

A better solution for this requirement would have been a trigger. You can configure the trigger to check the data only once, when the row is inserted. It is also possible to define the trigger so that it checks the data whenever the row is updated, but only if “row” is modified.

Breaking restore with a bad CHECK constraint

A bad constraint that checks dependencies between tables

For the sake of the example, let's consider a rental car management system:

Now it is not a good idea to rent out a small car to a big group, and we know that it is a good idea to enforce constraints in the database. So we want to have a constraint that keeps us from assigning a car to a client if the group size exceeds the number of seats in the car:

What an annoying limitation! This shows us that PostgreSQL indeed places some limits on what we are allowed to do in a CHECK constraint. Fortunately we can work around it with functions:

Great! That's much more readable anyway!

Testing the constraint

Let's verify that this works as intended:

Looks good!

The broken backup

We perform database backups using

On the dark day of data loss, when we try to restore our backup, we will get an unexpected error message:

What went wrong, and what should we have done instead?

PostgreSQL dumps tables in alphabetic order to be deterministic. While it dumps most constraints in the end to speed up data loading and prevent integrity problems, it considers CHECK constraints to be part of the table definition and dumps them in the beginning. That is fine, since the expressions in CHECK constraints must be retrospectively deterministic and depend only on the row itself. Unfortunately, our cute constraint does not satisfy that requirement. PostgreSQL creates the tables, then restores the data for client, rented and vehicle in that order. So when rented is loaded, vehicle is still empty, get_seats returns 0 and the constraint fails.

Again, the correct solution would have been a trigger that throws an error if the condition is violated. While constraints have to be fulfilled all the time, triggers only fire at certain events. A dump contains trigger definitions at the end, and the triggers are not fired at all when the dump is restored.

If we want to make sure that the condition is always fulfilled, we can go a different way: we could add a copy of seats and group_size to rented and include these columns in the FOREIGN KEY constraints to make sure they are always identical. Then the CHECK constraint only has to compare columns of rented, and the problem is gone.

Conclusion

We saw two examples of how a badly defined CHECK constraint can break a database. The restore failure was particularly unpleasant. Sadly, these are not academic cases, and there have been reports about restore failures on the mailing lists. So make sure that your CHECK constraints are defined correctly, even if PostgreSQL cannot enforce that.

Apart from that, we have learned the pretentious phrase “retrospectively deterministic”, which may draw admiration or a shake of the head, depending on how gullible the audience is.

2 responses to “Breaking your PostgreSQL database with bad CHECK constraints”

  1. The source of this error is not CHECK constraint itself but unsafe datatype convertion used in CHECK constraint expression. You have used the datatype convertion with user-defined type, the datatype was altered, and now your expression fails. But it will fail in any place - in generated column, functional index expression, and even in simple query.

    • Quite correct. This can cause problems in other situations as well. Changing a user-defined data type is dangerous.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram