In SQL, the concept of foreign keys is an important one that can be found in all professional databases used in the industry. The core idea is to prevent your PostgreSQL database from storing inconsistent data by enforcing constraints ensuring the correctness of your tables (at least as far as relations between objects are concerned). Referential integrity is therefore one of the most important concepts ever invented in IT.
Table of Contents
However, foreign keys will introduce some issues which you have to take care of when writing applications. If there are no foreign keys, you can insert data into any table in any order. PostgreSQL does not care. However, if a foreign key is in place, order starts to matter (at least in a typical scenario but more on that later).
To show the importance of order, we have to create a data model first:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
CREATE TABLE t_currency ( id int, shortcut char (3), PRIMARY KEY (id) ); CREATE TABLE t_location ( id int, location_name text, PRIMARY KEY (id) ); CREATE TABLE t_product ( id int, name text, currency_id int REFERENCES t_currency (id), PRIMARY KEY (id) ); CREATE TABLE t_product_desc ( id int, product_id int REFERENCES t_product (id), description text, PRIMARY KEY (id) ); CREATE TABLE t_product_stock ( product_id int REFERENCES t_product (id), location_id int REFERENCES t_location (id), amount numeric CHECK (amount >= 0) ); |
We want to store currencies, products, as well as product descriptions. Basically it is a very simple data model. Let us see if we happen to insert into the product table:
1 2 3 4 5 6 |
test=# INSERT INTO t_product VALUES (1, 'PostgreSQL consulting', 1); ERROR: insert or update on table 't_product' violates foreign key constraint 't_product_currency_id_fkey' DETAIL: Key (currency_id)=(1) is not present in table 't_currency'. test=# INSERT INTO t_product VALUES (1, 'PostgreSQL consulting', NULL); INSERT 0 1 |
Logically the first INSERT is going to fail because currency number 1 does not exist yet. If we want to INSERT, we have to use a NULL value (= unknown currency). In order words: We have to fill the currency table first, then insert locations, and so on. The order does matter in the default case.
If you have to start using an existing data model, it can be a bit hard to wrap your head around this stuff. Populating an empty data model can be a bit tricky. So why not write a query telling us the order in which we are supposed to insert data?
Well, here is that magic query...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
WITH RECURSIVE fkeys AS ( /* source and target tables for all foreign keys */ SELECT conrelid AS source, confrelid AS target FROM pg_constraint WHERE contype = 'f' ), tables AS ( ( /* all tables ... */ SELECT oid AS table_name, 1 AS level, ARRAY[oid] AS trail, FALSE AS circular FROM pg_class WHERE relkind = 'r' AND NOT relnamespace::regnamespace::text LIKE ANY (ARRAY['pg_catalog', 'information_schema', 'pg_temp_%']) EXCEPT /* ... except the ones that have a foreign key */ SELECT source, 1, ARRAY[ source ], FALSE FROM fkeys ) UNION ALL /* all tables with a foreign key pointing a table in the working set */ SELECT fkeys.source, tables.level + 1, tables.trail || fkeys.source, tables.trail @> ARRAY[fkeys.source] FROM fkeys JOIN tables ON tables.table_name = fkeys.target /* * Stop when a table appears in the trail the third time. * This way, we get the table once with 'circular = TRUE'. */ WHERE cardinality(array_positions(tables.trail, fkeys.source)) < 2 ), ordered_tables AS ( /* get the highest level per table */ SELECT DISTINCT ON (table_name) table_name, level, circular FROM tables ORDER BY table_name, level DESC ) SELECT table_name::regclass, level FROM ordered_tables WHERE NOT circular ORDER BY level, table_name; |
The query is not trivial to read, but I have done my best to document it a bit. Basically, the PostgreSQL system tables have all the information we need to determine the correct order. Here is the output:
1 2 3 4 5 6 7 8 |
table_name | level -----------------+------- t_currency | 1 t_location | 1 t_product | 2 t_product_desc | 3 t_product_stock | 3 (5 rows) |
As you can see, the query has correctly given us the tables in the desired order. First, we have to insert into all tables at level one and so on. If we stick to this order, referential integrity will always be ensured (assuming the data is correct).
In some cases, the insertion order can be a nasty thing to deal with. What if we had the means to tell PostgreSQL to ignore the order and check integrity on commit instead? This is exactly what "initially deferred" does. Here is how it works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
BEGIN; CREATE TABLE t_currency ( id int, shortcut char (3), PRIMARY KEY (id) ); CREATE TABLE t_product ( id int, name text, currency_id int REFERENCES t_currency (id) INITIALLY DEFERRED, PRIMARY KEY (id) ); INSERT INTO t_product VALUES (1, 'PostgreSQL support', 1); INSERT INTO t_currency VALUES (1, 'EUR'); COMMIT; |
In this case, we can modify data in any order we want. As long as integrity is guaranteed to be intact at the end of the transaction, PostgreSQL is perfectly fine. PostgreSQL will postpone the constraint check and take some burden off the developer.
If you want to learn more about advanced SQL, you might want to take a look at my blog about some more advanced windowing functions (with ties). So put on your tie and read to learn more.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Thanks a lot!!! You saved my day 🙂
I think there is a small "bug" (?): I have a circular dependency situation. But i think the "direction" matters:
C -> B
B -> A
C -> A
C was excluded from the result
Great, query. But with large databases of 600 tables or more. This query will crash the db. Any performance improvement recommendations?