Foreign Key Indexing and Performance in PostgreSQL

10.2018 | Category: Performance | Tags: administration, foreign key, index, performance, sql help

10.2018

Category: Performance

Tags: administration, foreign key, index, performance, sql help

A foreign key is no wrong key! — © Laurenz Albe 2018

Foreign key constraints are an important tool to keep your database consistent while also documenting relationships between tables.

Table of Contents

A fact that is often ignored is that foreign keys need proper indexing to perform well.

This article will explain that and show you how to search for missing indexes.

Index at the target of a foreign key

In the following, I will call the table, on which the foreign key constraint is defined, the source table and the referenced table the target table.

The referenced columns in the target table must have a primary key or unique constraint. Such constraints are implemented with unique indexes in PostgreSQL. Consequently, the target side of a foreign key is automatically indexed.

This is required so that there is always a well-defined row to which the foreign key points. The index also comes handy if you want to find the row in the target table that matches a row in the source table.

Index at the source of a foreign key

In contrast to the above, PostgreSQL requires no index at the source of a foreign key.

However, such an index is quite useful for finding all source rows that reference a target row. The typical cases where you need that are:

1. You perform a join between the two tables where you explicitly search for the source rows referencing one or a few target rows. If there is an index on the columns at the source, PostgreSQL can use an efficient nested loop join.

This is well known and pretty obvious.

2. You delete rows or update key columns in the target table.

Then PostgreSQL has to check if the foreign key constraint is still satisfied. It does so by searching if there are rows in the source table that would become orphaned by the data modification. Without an index, this requires a sequential scan of the source table.

An example

Let's build a source and a target table:

-- to make the plans look simpler
SET max_parallel_workers_per_gather = 0;
-- to speed up CREATE INDEX
SET maintenance_work_mem = '512MB';

CREATE TABLE target (
   t_id integer NOT NULL,
   t_name text NOT NULL
);
INSERT INTO target (t_id, t_name)
   SELECT i, 'target ' || i
   FROM generate_series(1, 500001) AS i;

ALTER TABLE target
   ADD PRIMARY KEY (t_id);

CREATE INDEX ON target (t_name);

/* set hint bits and collect statistics */
VACUUM (ANALYZE) target;

CREATE TABLE source (
   s_id integer NOT NULL,
   t_id integer NOT NULL,
   s_name text NOT NULL
);
INSERT INTO source (s_id, t_id, s_name)
   SELECT i, (i - 1) % 500000 + 1, 'source ' || i
   FROM generate_series(1, 1000000) AS i;

ALTER TABLE source
   ADD PRIMARY KEY (s_id);

ALTER TABLE source
   ADD FOREIGN KEY (t_id) REFERENCES target;

/* set hint bits and collect statistics */
VACUUM (ANALYZE) source;

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

-- to make the plans look simpler

SET max_parallel_workers_per_gather = 0;

-- to speed up CREATE INDEX

SET maintenance_work_mem = '512MB';

CREATE TABLE target (

t_id integer NOT NULL,

t_name text NOT NULL

);

INSERT INTO target (t_id, t_name)

SELECT i, 'target ' || i

FROM generate_series(1, 500001) AS i;

ALTER TABLE target

ADD PRIMARY KEY (t_id);

CREATE INDEX ON target (t_name);

/* set hint bits and collect statistics */

VACUUM (ANALYZE) target;

CREATE TABLE source (

s_id integer NOT NULL,

t_id integer NOT NULL,

s_name text NOT NULL

);

INSERT INTO source (s_id, t_id, s_name)

SELECT i, (i - 1) % 500000 + 1, 'source ' || i

FROM generate_series(1, 1000000) AS i;

ALTER TABLE source

ADD PRIMARY KEY (s_id);

ALTER TABLE source

ADD FOREIGN KEY (t_id) REFERENCES target;

/* set hint bits and collect statistics */

VACUUM (ANALYZE) source;

Query time without an index

Looking up source rows via the link to target and deleting rows from target is unreasonably slow:

EXPLAIN (ANALYZE)
SELECT source.s_name
FROM source
   JOIN target USING (t_id)
WHERE target.t_name = 'target 42';

                          QUERY PLAN
----------------------------------------------------------------
 Hash Join  (cost=8.45..19003.47 rows=2 width=13)
            (actual time=0.150..360.920 rows=2 loops=1)
   Hash Cond: (source.t_id = target.t_id)
   ->  Seq Scan on source
                    (cost=0.00..16370.00 rows=1000000 width=17)
              (actual time=0.011..155.964 rows=1000000 loops=1)
   ->  Hash  (cost=8.44..8.44 rows=1 width=4)
             (actual time=0.111..0.111 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Index Scan using target_t_name_idx on target
                               (cost=0.42..8.44 rows=1 width=4)
                      (actual time=0.105..0.107 rows=1 loops=1)
               Index Cond: (t_name = 'target 42'::text)
 Planning time: 0.701 ms
 Execution time: 360.982 ms
(9 rows)

EXPLAIN (ANALYZE)
DELETE FROM target
WHERE target.t_name = 'target 500001';

                          QUERY PLAN
----------------------------------------------------------------
 Delete on target  (cost=0.42..8.44 rows=1 width=6)
                   (actual time=0.178..0.178 rows=0 loops=1)
   ->  Index Scan using target_t_name_idx on target
                               (cost=0.42..8.44 rows=1 width=6)
                      (actual time=0.107..0.109 rows=1 loops=1)
         Index Cond: (t_name = 'target 500001'::text)
 Planning time: 0.165 ms
 Trigger for constraint source_t_id_fkey: time=153.804 calls=1
 Execution time: 154.030 ms
(6 rows)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

EXPLAIN (ANALYZE)

SELECT source.s_name

FROM source

JOIN target USING (t_id)

WHERE target.t_name = 'target 42';

QUERY PLAN

----------------------------------------------------------------

Hash Join (cost=8.45..19003.47 rows=2 width=13)

(actual time=0.150..360.920 rows=2 loops=1)

Hash Cond: (source.t_id = target.t_id)

-> Seq Scan on source

(cost=0.00..16370.00 rows=1000000 width=17)

(actual time=0.011..155.964 rows=1000000 loops=1)

-> Hash (cost=8.44..8.44 rows=1 width=4)

(actual time=0.111..0.111 rows=1 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Index Scan using target_t_name_idx on target

(cost=0.42..8.44 rows=1 width=4)

(actual time=0.105..0.107 rows=1 loops=1)

Index Cond: (t_name = 'target 42'::text)

Planning time: 0.701 ms

Execution time: 360.982 ms

(9 rows)

EXPLAIN (ANALYZE)

DELETE FROM target

WHERE target.t_name = 'target 500001';

QUERY PLAN

----------------------------------------------------------------

Delete on target (cost=0.42..8.44 rows=1 width=6)

(actual time=0.178..0.178 rows=0 loops=1)

-> Index Scan using target_t_name_idx on target

(cost=0.42..8.44 rows=1 width=6)

(actual time=0.107..0.109 rows=1 loops=1)

Index Cond: (t_name = 'target 500001'::text)

Planning time: 0.165 ms

Trigger for constraint source_t_id_fkey: time=153.804 calls=1

Execution time: 154.030 ms

(6 rows)

Query time with an index

After creating the appropriate index:

CREATE INDEX source_t_id_idx ON source (t_id);

1	CREATE INDEX source_t_id_idx ON source (t_id);

the queries are as fast as they should be:

EXPLAIN (ANALYZE)
SELECT source.s_name
FROM source
   JOIN target USING (t_id)
WHERE target.t_name = 'target 42';

                          QUERY PLAN
----------------------------------------------------------------
 Nested Loop  (cost=0.85..19.89 rows=2 width=13)
              (actual time=0.068..0.076 rows=2 loops=1)
   ->  Index Scan using target_t_name_idx on target
                               (cost=0.42..8.44 rows=1 width=4)
                      (actual time=0.048..0.049 rows=1 loops=1)
         Index Cond: (t_name = 'target 42'::text)
   ->  Index Scan using source_t_id_idx on source
                             (cost=0.42..11.43 rows=2 width=17)
                      (actual time=0.013..0.019 rows=2 loops=1)
         Index Cond: (t_id = target.t_id)
 Planning time: 1.238 ms
 Execution time: 0.147 ms
(7 rows)

EXPLAIN (ANALYZE)
DELETE FROM target
WHERE target.t_name = 'target 500001';

                           QUERY PLAN
----------------------------------------------------------------
 Delete on target  (cost=0.42..8.44 rows=1 width=6)
                   (actual time=0.135..0.135 rows=0 loops=1)
   ->  Index Scan using target_t_name_idx on target
                               (cost=0.42..8.44 rows=1 width=6)
                      (actual time=0.091..0.094 rows=1 loops=1)
         Index Cond: (t_name = 'target 500001'::text)
 Planning time: 0.151 ms
 Trigger for constraint source_t_id_fkey: time=0.557 calls=1
 Execution time: 0.751 ms
(6 rows)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

EXPLAIN (ANALYZE)

SELECT source.s_name

FROM source

JOIN target USING (t_id)

WHERE target.t_name = 'target 42';

QUERY PLAN

----------------------------------------------------------------

Nested Loop (cost=0.85..19.89 rows=2 width=13)

(actual time=0.068..0.076 rows=2 loops=1)

-> Index Scan using target_t_name_idx on target

(cost=0.42..8.44 rows=1 width=4)

(actual time=0.048..0.049 rows=1 loops=1)

Index Cond: (t_name = 'target 42'::text)

-> Index Scan using source_t_id_idx on source

(cost=0.42..11.43 rows=2 width=17)

(actual time=0.013..0.019 rows=2 loops=1)

Index Cond: (t_id = target.t_id)

Planning time: 1.238 ms

Execution time: 0.147 ms

(7 rows)

EXPLAIN (ANALYZE)

DELETE FROM target

WHERE target.t_name = 'target 500001';

QUERY PLAN

----------------------------------------------------------------

Delete on target (cost=0.42..8.44 rows=1 width=6)

(actual time=0.135..0.135 rows=0 loops=1)

-> Index Scan using target_t_name_idx on target

(cost=0.42..8.44 rows=1 width=6)

(actual time=0.091..0.094 rows=1 loops=1)

Index Cond: (t_name = 'target 500001'::text)

Planning time: 0.151 ms

Trigger for constraint source_t_id_fkey: time=0.557 calls=1

Execution time: 0.751 ms

(6 rows)

How to check for missing indexes?

The following query will list all foreign key constraints in the database that do not have an index on the source columns:

SELECT c.conrelid::regclass AS "table",
       /* list of key column names in order */
       string_agg(a.attname, ',' ORDER BY x.n) AS columns,
       pg_catalog.pg_size_pretty(
          pg_catalog.pg_relation_size(c.conrelid)
       ) AS size,
       c.conname AS constraint,
       c.confrelid::regclass AS referenced_table
FROM pg_catalog.pg_constraint c
   /* enumerated key column numbers per foreign key */
   CROSS JOIN LATERAL
      unnest(c.conkey) WITH ORDINALITY AS x(attnum, n)
   /* name for each key column */
   JOIN pg_catalog.pg_attribute a
      ON a.attnum = x.attnum
         AND a.attrelid = c.conrelid
WHERE NOT EXISTS
        /* is there a matching index for the constraint? */
        (SELECT 1 FROM pg_catalog.pg_index i
         WHERE i.indrelid = c.conrelid
           /* it must not be a partial index */
           AND i.indpred IS NULL
           /* the first index columns must be the same as the
              key columns, but order doesn't matter */
           AND (i.indkey::smallint[])[0:cardinality(c.conkey)-1]
               OPERATOR(pg_catalog.@>) c.conkey)
  AND c.contype = 'f'
GROUP BY c.conrelid, c.conname, c.confrelid
ORDER BY pg_catalog.pg_relation_size(c.conrelid) DESC;

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

SELECT c.conrelid::regclass AS "table",

/* list of key column names in order */

string_agg(a.attname, ',' ORDER BY x.n) AS columns,

pg_catalog.pg_size_pretty(

pg_catalog.pg_relation_size(c.conrelid)

) AS size,

c.conname AS constraint,

c.confrelid::regclass AS referenced_table

FROM pg_catalog.pg_constraint c

/* enumerated key column numbers per foreign key */

CROSS JOIN LATERAL

unnest(c.conkey) WITH ORDINALITY AS x(attnum, n)

/* name for each key column */

JOIN pg_catalog.pg_attribute a

ON a.attnum = x.attnum

AND a.attrelid = c.conrelid

WHERE NOT EXISTS

/* is there a matching index for the constraint? */

(SELECT 1 FROM pg_catalog.pg_index i

WHERE i.indrelid = c.conrelid

/* it must not be a partial index */

AND i.indpred IS NULL

/* the first index columns must be the same as the

key columns, but order doesn't matter */

AND (i.indkey::smallint[])[0:cardinality(c.conkey)-1]

OPERATOR(pg_catalog.@>) c.conkey)

AND c.contype = 'f'

GROUP BY c.conrelid, c.conname, c.confrelid

ORDER BY pg_catalog.pg_relation_size(c.conrelid) DESC;

The result is ordered by the size of the table so that the tables, where the missing index hurts most, are listed on top.

Should I create indexes for `all` foreign keys?

If the source table is small, you don't need the index, because then a sequential scan is probably cheaper than an index scan anyway.

Also, if you know that you never need the index for a join and you will never delete a row or update a key column in the target table, the index is unnecessary.

There is one simple way to proceed: create all missing indexes, wait a couple of days and then get rid of the indexes that were never used.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on X, Facebook, or LinkedIn.

8 responses to “Foreign Key Indexing and Performance in PostgreSQL”

Przemysław Fusik says:

March 5, 2020 at 12:12 pm

Hi,

thanks for great article.

It explains a lot.

: 1:

Reply
Rinat Mukhtarov says:

April 18, 2020 at 4:59 pm

Sorry, but query from section "How to check for missing indexes?" throw an error:

[42725] ERROR: operator is not unique: smallint[] @> smallint[] Hint: Could not choose a best candidate operator. You might need to add explicit type casts. Position: 1000

PostgreSQL 10.5 (Debian 10.5-2.pgdg90 1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18 deb9u1) 6.3.0 20170516, 64-bit

Reply
- laurenz says:
  
  April 20, 2020 at 5:46 am
  
  That probably means that you have created additional operators or type casts. The documentation recommends:
  
  The set of casts between built-in types has been carefully crafted and is best not altered.
  
  Reply
Nux says:

May 24, 2020 at 9:14 pm

Nice article :-).

One note about that query for checking indexes -- it only works in PG SQL 9.4 and above. Mostly because of
WITH ORDINALITY and also the
cardinality function.

Reply
Ivan Kleshnin says:

June 23, 2020 at 5:59 am

Can we update the "missing check" algorithm to not display indexes covered by compound key?
I mean if we have a purchase table:
purchase - "id" (PK) - "accountId" (FK) - "itemId" (FK)
and a compound unique constraint:
"purchase_accountId_courseId_key" UNIQUE CONSTRAINT, btree ("accountId", "itemId")
this index should cover all queries with accountId and with itemId so two separate indexes would only waste DB space.

Reply
- laurenz says:
  
  June 23, 2020 at 10:23 am
  
  But that would not be reported by my query, because the unique constraint would be in pg_index. Perhaps I misunderstand you, can you give me a complete example?
  
  Reply
Omri Shani says:

August 16, 2020 at 10:49 am

I think it is better to filter out partial indices. I did it with where indpred is null inside the exists query on pg_index

Reply
stagei says:

October 10, 2020 at 8:02 pm

Very good article. Did the same thing on Oracle some years ago but is a newbee in regards to Postgres 🙂
Thx

Reply

Leave a Reply Cancel reply

Stay tuned with our

NEWSLETTER

CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0

office@cybertec-postgresql.com

ISO_27001_Badge

Customer Support

Support Platform

Services

Support CYBERTEC Partner PostgreSQL Books

Company

STAY TUNED WITH OUR NEWSLETTER

Get the newest PostgreSQL Info & Tools

Data Protection Policy Terms and conditions Terms of Service Imprint

©

2025

CYBERTEC PostgreSQL International GmbH