Table of Contents
This article is about the notoriously difficult problem of “conditional foreign keys”. In object-oriented programming languages, polymorphism is a natural concept. However, it does not translate well to SQL. As a consequence, many people whose data models are driven by an object-oriented application design end up trying to implement conditional foreign keys. In this article, I will explore different ways to solve the problem and their respective advantages and disadvantages. Most of this discussion is not specific to PostgreSQL, but I will also explore how PostgreSQL features like JSON support and table inheritance can help with conditional foreign keys.
As an example, let's consider a data model in Java for a traffic insurance company. We want to model different kinds of vehicles:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
public abstract class Vehicle { private java.util.Date manufacturer; private java.math.BigDecimal price; } public abstract class MotorVehicle extends Vehicle { private String powersource; private String license_plate; private int watt; } public class Car extends MotorVehicle { private int doors; private int seats; } public class Bicycle extends Vehicle { private int gears; } |
Vehicle
and MotorVehicle
are abstract classes, that is, you cannot create objects of that type. They define attributes that are common to all vehicles or motor vehicles. The classes you can instantiate (create actual objects of that type) are Car
and Bicycle
.
Now we can model customers and contracts:
1 2 3 4 5 6 7 8 9 10 11 12 |
public class Customer { private String name; private java.util.Date birthday; } public class Contract { private Customer customer; private Vehicle vehicle; private java.util.Date start; private java.util.Date end; private java.math.BigDecimal payment; } |
A Contract
references a Customer
and a Vehicle
. Note that while there can be no objects of type Vehicle
, the attribute vehicle
can reference any object of a subclass of Vehicle
, like a Car
or a Bicycle
. This useful feature is called polymorphism.
To persist the data from the insurance application, we have to map the application objects to database tables. Customer
, Car
and Bicycle
are easy to translate:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
CREATE TABLE customer ( customer_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, name text NOT NULL, birthday date NOT NULL ); CREATE TABLE car ( car_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, watt integer NOT NULL, doors integer NOT NULL, seats integer NOT NULL, manufacturer text NOT NULL, price numeric(15,2) NOT NULL, powersource text NOT NULL, license_plate text NOT NULL ); CREATE TABLE bicycle ( bicycle_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, gears integer NOT NULL, manufacturer text NOT NULL, price numeric(15,2) NOT NULL ); |
The problem occurs when we try to map Contract
to a database table:
[sql highlight="5"]
CREATE TABLE contract (
contract_id bigint GENERATED ALWAYS AS IDENTITY
PRIMARY KEY,
customer_id bigint REFERENCES customer NOT NULL,
vehicle_id bigint REFERENCES ???,
start date NOT NULL,
end date NOT NULL,
payment numeric(15,2) NOT NULL
);
The foreign key vehicle_id
should sometimes reference a car
, sometimes a bicycle
, and sometimes some other type of vehicle. This is what I call a “conditional foreign key”. You cannot directly write a conditional foreign key in SQL. So, what options do you have to model one?
With this technique, you add several foreign keys to contract
: one for each type of vehicle. An additional column v_type
identified the type of vehicle.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
CREATE TABLE vehicle_type ( id integer PRIMARY KEY, name text NOT NULL ); INSERT INTO vehicle_type VALUES (1, 'car'), (2, 'bicycle'); CREATE TABLE contract ( contract_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, customer_id bigint REFERENCES customer NOT NULL, car_id bigint REFERENCES car, bicycle_id bigint REFERENCES bicycle, start_date date NOT NULL, end_date date NOT NULL, v_type integer REFERENCES vehicle_type NOT NULL, payment numeric(15,2) NOT NULL ); |
I added v_type
further down to avoid wasting storage space. An additional check constraint makes sure that the data are consistent:
1 2 3 4 5 6 7 8 |
ALTER TABLE contract ADD CONSTRAINT conditional_key_check CHECK (CAST ((v_type = 1 AND car_id IS NOT NULL AND bicycle_id IS NULL) AS integer) + CAST ((v_type = 2 AND car_id IS NULL AND bicycle_id IS NOT NULL) AS integer) = 1); |
This code makes use of the fact that the boolean FALSE
converts to the integer 0 and TRUE
converts to 1.
The advantages of this solution are that consistency is guaranteed, and that we don't have to modify car
and bicycle
. The disadvantage is that we need many foreign key columns in contract
. We have to add another foreign key for each new kind of vehicle, and we have to adapt the check constraint. This solution is particularly appealing in cases with not too many types of vehicles and if the various types have few common attributes.
With this technique, we store all vehicle data in a single table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
CREATE TABLE vehicle ( vehicle_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, watt integer, doors integer, seats integer, gears integer, v_type integer REFERENCES vehicle_type NOT NULL, price numeric(15,2) NOT NULL, manufacturer text NOT NULL, powersource text, license_plate text, CHECK (CAST ((v_type = 1 AND ROW(watt, doors, seats, powersource, license_plate) IS NOT NULL AND gears IS NULL) AS integer) + CAST ((v_type = 2 AND ROW(watt, doors, seats, powersource, license_plate) IS NULL AND gears IS NOT NULL) AS integer) = 1) ); |
Again, a check constraint makes sure that the data are consistent, and each type of vehicle has only the appropriate attributes set. If we want, we can create views for car
and bicycle
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
DROP TABLE IF EXISTS car; CREATE VIEW car AS SELECT vehicle_id AS car_id, watt, doors, seats, manufacturer, price, powersource, license_plate FROM vehicle WHERE v_type = 1; DROP TABLE IF EXISTS bicycle; CREATE VIEW bicycle AS SELECT vehicle_id AS bicycle_id, manufacturer, price, gears FROM vehicle WHERE v_type = 2; |
Now, the definition of contract
is simple:
1 2 3 4 5 6 7 8 9 |
CREATE TABLE contract ( contract_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, customer_id bigint REFERENCES customer NOT NULL, vehicle_id bigint REFERENCES vehicle NOT NULL, start_date date NOT NULL, end_date date NOT NULL, payment numeric(15,2) NOT NULL ); |
The advantage of this solution is that the definition of contract
is straightforward. The disadvantage is that vehicle
has to grow new columns for each additional vehicle type. This solution is particularly useful if the tables referenced have most columns in common, so that vehicle
does not end up having too many columns. It is also an appealing solution if there are many types of vehicles.
jsonb
This is a variation of the previous method and relies on PostgreSQL's JSON capabilities. Instead of creating columns for all vehicle types in the vehicle
table, we add the additional attributes in a single jsonb
column:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
CREATE TABLE vehicle ( vehicle_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, v_type integer REFERENCES vehicle_type NOT NULL, price numeric(15,2) NOT NULL, manufacturer text NOT NULL, data jsonb NOT NULL, CHECK (CAST ((v_type = 1 AND ROW(data -> 'watt', data -> 'doors', data -> 'seats', data -> 'powersource', data -> 'license_plate') IS NOT NULL AND data -> 'gears' IS NULL) AS integer) + CAST ((v_type = 2 AND ROW(data -> 'watt', data -> 'doors', data -> 'seats', data -> 'powersource', data -> 'license_plate') IS NULL AND data -> 'gears' IS NOT NULL) AS integer) = 1) ); |
Again, we can create views for the individual types of vehicles:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
CREATE VIEW car AS SELECT vehicle_id AS car_id, (data ->> 'watt')::integer AS watt, (data ->> 'doors')::integer AS doors, (data ->> 'seats')::integer AS seats, manufacturer, price, data ->> 'powersource' AS powersource, data ->> 'license_plate' AS license_plate FROM vehicle WHERE v_type = 1; CREATE VIEW bicycle AS SELECT vehicle_id AS bicycle_id, manufacturer, price, (data ->> 'gears')::integer AS gears FROM vehicle WHERE v_type = 2; |
The idea is to model all frequent attributes as normal columns and all rare ones as JSON attributes. The advantage is that the number of columns stays moderate, which is great if there are many types of vehicles with many attributes. If you create a GIN index on the jsonb
column, you can perform efficient searches. On the down side, fetching and updating the JSON attributes will be more expensive.
With this method, we don't add a foreign key vehicle_id
to contract
. Instead, we add foreign keys from car
and bicycle
to contract
:
1 2 3 4 5 |
ALTER TABLE car ADD contract_id bigint REFERENCES contract NOT NULL; ALTER TABLE bicycle ADD contract_id bigint REFERENCES contract NOT NULL; |
You will say that that is not really correct, and you are right. The disadvantage of this solution is that we don't have a guarantee that each contract
is related to one and only one vehicle. But it is a simple solution that you can resort to if the previous solutions are not feasible. If you index these foreign keys like you should, you can still find the vehicle that belongs to a contract fairly efficiently.
Back in the day when object-oriented programming was the latest fad, PostgreSQL added table inheritance to turn from a RDBMS (relational database management system) to an ORDBMS (object-relational database management system). With inheritance, you can model subclasses as tables:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
CREATE TABLE vehicle ( vehicle_id bigint GENERATED ALWAYS AS IDENTITY, manufacturer date NOT NULL, price numeric(15,2) NOT NULL ); CREATE TABLE motorvehicle ( powersource text NOT NULL, license_plate text NOT NULL, watt integer ) INHERITS (vehicle); CREATE TABLE car ( doors integer NOT NULL, seats integer NOT NULL ) INHERITS (motorvehicle); CREATE TABLE bicycle ( gears integer NOT NULL ) INHERITS (vehicle); |
At first glance, this looks like it would be a perfect fit for our problem! Indeed, it can save you from writing the same column definition over and over, since tables automatically inherit all columns from their inheritance parents. But all these tables are separate tables, and a primary key constraint on vehicle
will exist only on that table itself. It won't constrain the values of vehicle_id
in the other tables. Each table needs its own primary key, and you will have to implement a solution like the first one I suggested.
I showed you some ways to model conditional foreign keys. Each of them has its advantages and disadvantages, and none of them is perfect. PostgreSQL table inheritance does not solve the problem either. Polymorphism is nice in object-oriented languages, but it doesn't carry over well into SQL.
If you are interested in data models that are not a natural fit for relational databases, you might want to read my article on how to use JSON in PostgreSQL correctly. For an easy introduction to normalization, read Michał's article on the topic.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
But you forgot the [arguably] best method --- using extra "vehicle" table, with all the tables having FK to it (making it 100% consistent requires triggers, though).
I'd argue that that your method is a hybrid of method 2 and method 4, with all the disadvantages of method 4. Triggers are not a safe way to implement constraints - there are always race conditions.
But I agree with you: there are probably many other variations and combinations of these methods that I didn't list. I may even have omitted some new, creative ways to solve the problem.
Where I disagree is that your method is the best one. I maintain that each of these methods has its advantages and disadvantages, and what is best will depend on the individual use case. Sadly, none of the methods is perfect.
> Triggers are not a safe way to implement constraints - there are always race conditions.
Why that? In the article you share a solution with row-locks where you claim that its safe?
No, it is not? This is what was meant:
CREATE TABLE customer (
customer_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
birthday date NOT NULL
);
CREATE TABLE vehicle (
vehicle_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
manufacturer text NOT NULL,
price numeric(15,2) NOT NULL
);
CREATE TABLE car (
car_id bigint PRIMARY KEY REFERENCES vehicle,
watt integer NOT NULL,
doors integer NOT NULL,
seats integer NOT NULL,
powersource text NOT NULL,
license_plate text NOT NULL
);
CREATE TABLE bicycle (
bicycle_id bigint PRIMARY KEY REFERENCES vehicle,
gears integer NOT NULL
);
CREATE TABLE contract (
contract_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL REFERENCES customer ,
vehicle_id bigint REFERENCES vehicle,
d_start date NOT NULL,
d_end date NOT NULL,
payment numeric(15,2) NOT NULL
);
Only they are, if used correctly (the problem illustrated in your article is not triggers, it's the complexity of writing reliable constraints on lower isolation levels).
Indeed, all that you wrote in this post is hardly new --- google for "polymorphic associations", and you'll find all of the above methods (and more).
Whence "arguably".
Yes, but that is due to inability of some RDMBS-es to support non-trivial declarative constraints. 😉
> > is a hybrid of method 2 and method 4, with all the disadvantages of method 4.
>
> No, it is not? This is what was meant: [tables with additional properties referencing
vehicle
]Right, it is not exactly the same, since your tables reference
vehicle
, notcontract
. What I meant when I said it is like my method 4 is that the data model does not enforce consistency. There could be acar
and abicycle
with the same primary key value.Still, your suggestion has merits.
> > Triggers are not a safe way to implement constraints - there are always race conditions.
>
> Only they are, if used correctly (the problem illustrated in your article is not triggers, it's the complexity of writing reliable constraints on lower isolation levels).
Either you didn't read my article, or you didn't understand it.
> all that you wrote in this post is hardly new
I don't claim that I invented anything new here. The idea of the article is a synopsis of known solutions for the problem.
I was sloppy.
I should have said "triggers are not a safe way to implement constraints unless you use the
SERIALIZABLE
isolation level or heavy locking".Unfortunately, yes --- declarative constraints in modern RDBMS-es are too primitive for that. So, one needs triggers to use it.
Sure it does --- it's a relational (textbook) solution/material, unlike the others that are demonstrated in the article (which are just workarounds for the RDBMS defects, see above).
Why would I need to read something that makes counterfactual claims? 😉
Anyway, a) writing reliable triggers is possible (and even trivial if one uses SERIALIZABLE), but it's complex on lower isolation levels and b) the same integrity checks could be done without triggers (in the application), but there are obvious downsides to that.
Yet again, let me steal David's comment from that article which I wholeheartedly agree to:
And you didn't succeed to provide a complete one.
As you missed the "classic" solution, you either didn't read a textbook (about relational database design), or you didn't understand it. 😉
Why do you not use sequence object as PK in table vehicle? And than one constrain in table car /bicycle only
I would suggest Method 5 PostgreSQL table inheritance and polymorphism with jsonb columns.
I definitely recommend using a sequence (in the guise of an identity column) tu generate artificial primary keys. But I figured that that was irrelevant to the theme of my article.
If you use table inheritance, that would essentially be method 1 (several tables, several foreign keys). JSON is a trick you can use if you want the benefit of a single table, but avoid having hundreds of columns (method 3). I don't see any benefit in having several tables and using JSON in those.
Sorry, but an Identity Column is not the same as a SERIAL / Sequence Column!
And you wrote:
"... a primary key constraint on vehicle will exist only on that table itself. It won’t constrain the values of vehicle_id in the other tables. Each table needs its own primary key, ..."
With BIGSERIAL Column you have the same PK and every Table has own PK too.
When you then build CHECK CONSTRAIN's for Table car and bicycle you have a "perfect" solution.
An identity column is a sequence column. It is better than
serial
, because it conforms to the standard and makes it hard to manually insert non-sequence values.A
bigserial
column is not a primary key.A check constraint can only guarantee conditions within a single table row, never across rows or tables.
Sorry. This is your Answer?
Of sure is bigserial column not a PK, but identity column it isnt.
Wrote: CREATE TABLE vehicle ( vehicle_id bigserial primary key not null,
manufacturer date NOT NULL,
price numeric(15,2) NOT NULL
);
And see the BIGSERIAL column is a PK!
And yes, check contrain can only guarantee conditions within a single table row.
Your theme is not IDENTITY OR SERIAL COLUMN. It is "What is the best Schema Design"!
When you use inheritance, it is most better to use SERIAL columns, then will all inherited tables have a good PK column with this
And you need only two check contrains on car/bicycle. You will never insert data in vehicle / motorvehicle table. Why?. They are only relevant for search or report querys ...
You should only use car/bicycle table in your application for insert/updates ....
Thats it!
Thanks for the more detailed information. I am less confused now..
Yes, using table inheritance in combination with method 1 can be a good choice!