Everybody knows that a database index is a good thing because it can speed up SQL queries. But this does not come for free.
Table of Contents
The disadvantages of indexes are:
Indexes use up space. It is not unusual for database indexes to use as much storage space as the data themselves. And the kind of reliable, fast storage you want for a database is not necessarily cheap.
The space used up by indexes also increases the size and duration of physical backups.
Indexes slow down data modification. Whenever you INSERT into or DELETE from a table, all indexes have to be modified, in addition to the table itself (the “heap”).
And it is much more expensive to modify the complicated data structure of an index than the heap itself, which has its name precisely because it is basically an unordered “pile” of data (and as everybody knows, maintaining order is more work than having a mess). Modifying an indexed table can easily be an order of magnitude more expensive than modifying an unindexed table.
Indexes prevent HOT updates. Because of the architecture of PostgreSQL, every UPDATE causes a new row version (“tuple”) to be written, and that causes a new entry in every index on the table.
This behavior has been dubbed “write amplification” and has drawn a lot of fire. This undesirable effect can be avoided if a) the new tuple fits into the same table block as the old one and b) no indexed column is modified. Then PostgreSQL creates the new tuple as a “Heap Only Tuple” (hence HOT), which is much more efficient and also reduces the work VACUUM has to do.
The many uses of indexes
Now we know that we don't want unnecessary indexes. The problem is that indexes serve so many purposes that it is difficult to determine if a certain index is needed or not.
Here is a list of all benefits of indexes in PostgreSQL:
Indexes can speed up queries that use indexed columns (or expressions) in the WHERE clause.
Everybody knows that one!
The traditional B-tree index supports the <, <=, =, >= and > operators, while the many other index types in PostgreSQL can support more exotic operators like “overlaps” (for ranges or geometries), “distance” (for words) or regular expression matches.
B-tree indexes can speed up the max() and min() aggregates.
B-tree indexes can speed up ORDER BY clauses.
Indexes can speed up joins. This depends on the “join strategy” chosen by the optimizer: hash joins, for example, will never make use of an index.
A B-tree index on the origin of a FOREIGN KEY constraint avoids a sequential scan when rows are deleted (or keys modified) in the target table. A scan on the origin of the constraint is necessary to make sure that the constraint will not be violated by the modification.
Indexes are used to enforce constraints. Unique B-tree indexes are used to enforce PRIMARY KEY and UNIQUE constraints, while exclusion constraints use GiST indexes.
Indexes can provide the optimizer with better value distribution statistics.
If you create an index on an expression, ANALYZE and the autoanalyze daemon will not only collect statistics for the data distribution in table columns, but also for each expression that occurs in an index. This helps the optimizer to get a good estimate for the “selectivity” of complicated conditions that contain the indexed expression, which causes better plans to be chosen. This is a widely ignored benefit of indexes!
Find the unused indexes!
The following query that we at CYBERTEC use will show you all indexes that serve none of the above mentioned purposes.
It makes use of the fact that all uses of indexes in the above list with the exception of the last two result in an index scan.
For completeness' sake, I have to add that the parameter track_counts has to remain “on” for the query to work, otherwise index usage is not tracked in pg_stat_user_indexes. But you must not change that parameter anyway, otherwise autovacuum will stop working.
To find the indexes that have never been used since the last statistics reset with pg_stat_reset(), use
AND0<>ALL(i.indkey)-- no index column is an expression
ANDNOTi.indisunique-- is not a UNIQUE index
ANDNOTEXISTS-- does not enforce a constraint
(SELECT1FROMpg_catalog.pg_constraintc
WHEREc.conindid=s.indexrelid)
ANDNOTEXISTS-- is not an index partition
(SELECT1FROMpg_catalog.pg_inheritsASinh
WHEREinh.inhrelid=s.indexrelid)
ORDERBYpg_relation_size(s.indexrelid)DESC;
Some remarks:
Don't do that on your test database, but on the production database!
If your software is running at several customer sites, run the query on all of them.
Different users have different ways to use a software, which can cause different indexes to be used.
You can replace s.idx_scan = 0 in the query with a different condition, e.g. s.idx_scan < 10. Indexes that are very rarely used are also good candidates for removal.
pg_index.indkey contains the list of table columns on which the index is defined.
If an index column is an expressen, the corresponding entry is 0, and the expression is stored in pg_index.indexprs.
The condition in my query means “no entry in indkey is 0”.
As I explained in the article, they may serve a purpose even if they aren't scanned.
PostgreSQL collects statistics for the indexed expression, which can result in better execution plans.
I have a table that has 5 indexes on it. I truncate that table and insert a massive amount of data in that table. before i do the insertion, i would like to drop all indexes on that table. The query you provided, i guess i am not really understanding it correctly. I don't see where i pass my table name in it. Indrelid seems to be a numeric field and not a text. def. missing some code here
I am as new as anyone can be with postgreSQL. first time using Postgresql. Is there anything i am missing with this? Do i need some additional plugin installation?
This is not showing indexes created on part of json object.
Such indexes are expression indexes. I explained in point 7 of "The many uses of indexes" that such indexes serve another purpose too: they make PostgreSQL collect statistics on the indexed expression. So there is a chance that dropping the index might cause worse query estimates. My query only returns indexes that are guaranteed to be useless.
Yes, I recommend removing the indexes my query lists.
Performance (of data modifications) will be better afterwards.
Review the caveats in my article (run it on the production database, ...).
Hi Laurenz, this showed me a couple of indexes as unused, but it doesn't make sense because:
1- Without those indexes some of our regularly used queries would take hours to complete
2- EXPLAIN on queries with parameters matching customer usage, show the index being unused, funny thing after running EXPLAIN the idx_scan starts to go > 0
What you report doesn't make sense to me, sorry.
Running EXPLAIN doesn't change anything, so there is no explanation for that behavior.
I can only assume some trivial mistake, like running the query against a test database.
Okay so i ran EXPLAIN ANALYZE sorry i was misleading here which explains the stats incrementing each run, but not sure why they were 0 to begin with, do they get reset regularly these stats ?
EXPLAIN (ANALYZE) also doesn't change the table statistics, so that won't change execution plans. VACUUM (ANALYZE) would do that.
The statistics that my query relies on get reset with the function pg_stat_reset(). You shouldn't call that too often.
no one has ever called pg_stat_reset()
Running EXPLAIN ANALYZE on a query that uses an index 100% causes the index usage stats to be incremented, i've confirmed this multiple times now.
Ah, sorry, I was thinking in the wrong direction. Since EXPLAIN (ANALYZE) executes the query, it will of course increase the index scan count. To account for cases like this, you may want to use WHERE s.idx_scan < 10 rather than WHERE s.idx_scan = 0 in my query.
This does not work for index on partitioned tables. SQL Error [2BP01]: ERROR: cannot drop index [partition_index] because index [main_table_index] requires it
Indice : You can drop index [main_table_index] instead.
How can I get the list of all index that I can delete on main tables ?
Ah, that's something I forgot.
It will be difficult to find all partitioned indexes that can be dropped, because you'd have to check if all index partitions are unused. I have modified my query to exclude index partitions.
I wrote this query to get all index of tables and summary of index of partitionned tables :
select * from (
-- Index of partitionned tables
select
'partitioned index' as indextype,
nsp.nspname as schemaname,
table_class.relname as tablename,
parent_class.relname as indexname,
index_columns.idx_columns as idx_columns,
seek_childs.nb_child_index,
seek_childs.nb_scans
from pg_class parent_class
join pg_index parent_index on parent_index.indexrelid = parent_class.oid
join pg_namespace nsp on nsp.oid = parent_class.relnamespace -- to get schemaname
join pg_class table_class on table_class.oid = parent_index.indrelid
, lateral (
select count(stats_child.idx_scan) as nb_child_index, sum(stats_child.idx_scan) as nb_scans
from pg_catalog.pg_stat_user_indexes stats_child
join pg_inherits pi on pi.inhrelid = stats_child.indexrelid
where pi.inhparent = parent_class.oid
) seek_childs
, LATERAL (
SELECT string_agg(attname, ', ' order by attnum) AS idx_columns
FROM pg_attribute
WHERE attrelid = parent_class.oid
) index_columns
where parent_class.relkind = 'I'
AND 0 ALL (parent_index.indkey) -- no index column is an expression
AND NOT parent_index.indisunique -- is not a UNIQUE index
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint cc WHERE cc.conindid = parent_index.indexrelid)
and table_class.relname not like '%template'
union
-- Index of regular tables
select
'regular index' as indextype,
stats_child.schemaname,
stats_child.relname AS tablename,
c.relname as indexname,
index_columns.idx_columns as idx_columns,
null as nb_child_index,
stats_child.idx_scan as id_scan_count
from pg_class c
join pg_index idx_parent on idx_parent.indexrelid = c.oid
join pg_catalog.pg_stat_user_indexes stats_child on c.oid = stats_child.indexrelid
, LATERAL (
SELECT string_agg(attname, ', ' order by attnum) AS idx_columns
FROM pg_attribute
WHERE attrelid = c.oid
) index_columns
where c.relkind = 'i'
AND 0 ALL (idx_parent.indkey) -- no index column is an expression
AND NOT idx_parent.indisunique -- is not a UNIQUE index
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint cc
WHERE cc.conindid = idx_parent.indexrelid)
AND NOT EXISTS -- is not a child index
(SELECT 1 FROM pg_inherits pi
where pi.inhrelid = c.oid)
and stats_child.relname not like '%template'
) all_index;
One of my client have a table with 25 Indexes. I want to find out which is not being used last 1 month and need to recommend for dropping the same. Is there any script available to find the index last used date
Laurenz Albe is a senior consultant and support engineer at CYBERTEC. He has been working with and contributing to PostgreSQL since 2006, has written patches for core, and wrote oracle_fdw. He holds a Master's degree in Mathematics from the University of Vienna and a Master's in Computer Science from the Technical University of Vienna. In his spare time he enjoys reading to his children and pondering the roots of language.
Sure that this works with partial indexes?
Yes, this also works for partial indexes.
They show up in pg_stat_user_indexes just like all other indexes.
How can I use the output from this query to also drop the indexes?
You can use psql's gexec feature and modify the SELECT list as follows:
SELECT format('DROP INDEX %I./I', s.schemaname, s.indexrelname)
FROM ... gexec
gexec executes the query result as SQL commands.
Why "AND 0 ALL (i.indkey)", functionnal indexes are not listed ?
Thx
pg_index.indkey
contains the list of table columns on which the index is defined.If an index column is an expressen, the corresponding entry is 0, and the expression is stored in
pg_index.indexprs
.The condition in my query means “no entry in
indkey
is 0”.ok, but why exclude non "simple" indexes ?
As I explained in the article, they may serve a purpose even if they aren't scanned.
PostgreSQL collects statistics for the indexed expression, which can result in better execution plans.
ok, thx !
Is there a way to drop all indexes on a particular table? without hardcoding all the index names
With
psql
, you can useSELECT format('DROP INDEX %I;', indexrelid) FROM pg_index WHERE indrelid = 'mytab'::regclass gexec
That will work for all indexes that don't back a constraint — for these it will fail.
I have a table that has 5 indexes on it. I truncate that table and insert a massive amount of data in that table. before i do the insertion, i would like to drop all indexes on that table. The query you provided, i guess i am not really understanding it correctly. I don't see where i pass my table name in it. Indrelid seems to be a numeric field and not a text. def. missing some code here
Substitute your table name instead of
'mytab'
in my code sample.https://uploads.disquscdn.com/images/ffaf157b24574f074488b7e1be76f1b0bda32ae45225bee9467d549b0b5bcfd4.png
I am as new as anyone can be with postgreSQL. first time using Postgresql. Is there anything i am missing with this? Do i need some additional plugin installation?
This is not showing indexes created on part of json object.
Such indexes are expression indexes. I explained in point 7 of "The many uses of indexes" that such indexes serve another purpose too: they make PostgreSQL collect statistics on the indexed expression. So there is a chance that dropping the index might cause worse query estimates. My query only returns indexes that are guaranteed to be useless.
Hi Laurenz, If we get list of tables from this query, would you suggest removing indexes from these tables ? How will it impact the performance ?
Yes, I recommend removing the indexes my query lists.
Performance (of data modifications) will be better afterwards.
Review the caveats in my article (run it on the production database, ...).
sure. will do. Also with 200 concurrent users, postgresSQL show 99% CPU usage. Any idea what could be the reason for that ?
This is quite unrelated and would require investigation by a consultant.
Hi Laurenz, this showed me a couple of indexes as unused, but it doesn't make sense because:
1- Without those indexes some of our regularly used queries would take hours to complete
2- EXPLAIN on queries with parameters matching customer usage, show the index being unused, funny thing after running EXPLAIN the idx_scan starts to go > 0
What could be the cause of this ?
What you report doesn't make sense to me, sorry.
Running
EXPLAIN
doesn't change anything, so there is no explanation for that behavior.I can only assume some trivial mistake, like running the query against a test database.
Okay so i ran EXPLAIN ANALYZE sorry i was misleading here which explains the stats incrementing each run, but not sure why they were 0 to begin with, do they get reset regularly these stats ?
EXPLAIN (ANALYZE)
also doesn't change the table statistics, so that won't change execution plans.VACUUM (ANALYZE)
would do that.The statistics that my query relies on get reset with the function
pg_stat_reset()
. You shouldn't call that too often.no one has ever called pg_stat_reset()
Running EXPLAIN ANALYZE on a query that uses an index 100% causes the index usage stats to be incremented, i've confirmed this multiple times now.
Ah, sorry, I was thinking in the wrong direction. Since
EXPLAIN (ANALYZE)
executes the query, it will of course increase the index scan count. To account for cases like this, you may want to useWHERE s.idx_scan < 10
rather thanWHERE s.idx_scan = 0
in my query.This does not work for index on partitioned tables.
SQL Error [2BP01]: ERROR: cannot drop index [partition_index] because index [main_table_index] requires it
Indice : You can drop index [main_table_index] instead.
How can I get the list of all index that I can delete on main tables ?
Ah, that's something I forgot.
It will be difficult to find all partitioned indexes that can be dropped, because you'd have to check if all index partitions are unused. I have modified my query to exclude index partitions.
I tried to write the query : look for all partitioned index having(max(child.idx_scan<=0)) ... My brain hurts ! 🙂
I'll try again later
I wrote this query to get all index of tables and summary of index of partitionned tables :
select * from (
-- Index of partitionned tables
select
'partitioned index' as indextype,
nsp.nspname as schemaname,
table_class.relname as tablename,
parent_class.relname as indexname,
index_columns.idx_columns as idx_columns,
seek_childs.nb_child_index,
seek_childs.nb_scans
from pg_class parent_class
join pg_index parent_index on parent_index.indexrelid = parent_class.oid
join pg_namespace nsp on nsp.oid = parent_class.relnamespace -- to get schemaname
join pg_class table_class on table_class.oid = parent_index.indrelid
, lateral (
select count(stats_child.idx_scan) as nb_child_index, sum(stats_child.idx_scan) as nb_scans
from pg_catalog.pg_stat_user_indexes stats_child
join pg_inherits pi on pi.inhrelid = stats_child.indexrelid
where pi.inhparent = parent_class.oid
) seek_childs
, LATERAL (
SELECT string_agg(attname, ', ' order by attnum) AS idx_columns
FROM pg_attribute
WHERE attrelid = parent_class.oid
) index_columns
where parent_class.relkind = 'I'
AND 0 ALL (parent_index.indkey) -- no index column is an expression
AND NOT parent_index.indisunique -- is not a UNIQUE index
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint cc WHERE cc.conindid = parent_index.indexrelid)
and table_class.relname not like '%template'
union
-- Index of regular tables
select
'regular index' as indextype,
stats_child.schemaname,
stats_child.relname AS tablename,
c.relname as indexname,
index_columns.idx_columns as idx_columns,
null as nb_child_index,
stats_child.idx_scan as id_scan_count
from pg_class c
join pg_index idx_parent on idx_parent.indexrelid = c.oid
join pg_catalog.pg_stat_user_indexes stats_child on c.oid = stats_child.indexrelid
, LATERAL (
SELECT string_agg(attname, ', ' order by attnum) AS idx_columns
FROM pg_attribute
WHERE attrelid = c.oid
) index_columns
where c.relkind = 'i'
AND 0 ALL (idx_parent.indkey) -- no index column is an expression
AND NOT idx_parent.indisunique -- is not a UNIQUE index
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint cc
WHERE cc.conindid = idx_parent.indexrelid)
AND NOT EXISTS -- is not a child index
(SELECT 1 FROM pg_inherits pi
where pi.inhrelid = c.oid)
and stats_child.relname not like '%template'
) all_index;
One of my client have a table with 25 Indexes. I want to find out which is not being used last 1 month and need to recommend for dropping the same. Is there any script available to find the index last used date
No, there isn't. Get the use count and remember it, then look at the use count again one month from now and compare the numbers.