Everybody knows that a database index is a good thing because it can speed up SQL queries. But this does not come for free.
Table of Contents
The disadvantages of indexes are:
INSERT
into or DELETE
from a table, all indexes have to be modified, in addition to the table itself (the “heap”).UPDATE
causes a new row version (“tuple”) to be written, and that causes a new entry in every index on the table.VACUUM
has to do.Now we know that we don't want unnecessary indexes. The problem is that indexes serve so many purposes that it is difficult to determine if a certain index is needed or not.
Here is a list of all benefits of indexes in PostgreSQL:
WHERE
clause.<
, <=
, =
, >=
and >
operators, while the many other index types in PostgreSQL can support more exotic operators like “overlaps” (for ranges or geometries), “distance” (for words) or regular expression matches.max()
and min()
aggregates.ORDER BY
clauses.FOREIGN KEY
constraint avoids a sequential scan when rows are deleted (or keys modified) in the target table. A scan on the origin of the constraint is necessary to make sure that the constraint will not be violated by the modification.PRIMARY KEY
and UNIQUE
constraints, while exclusion constraints use GiST indexes.ANALYZE
and the autoanalyze daemon will not only collect statistics for the data distribution in table columns, but also for each expression that occurs in an index. This helps the optimizer to get a good estimate for the “selectivity” of complicated conditions that contain the indexed expression, which causes better plans to be chosen. This is a widely ignored benefit of indexes!The following query that we at CYBERTEC use will show you all indexes that serve none of the above mentioned purposes.
It makes use of the fact that all uses of indexes in the above list with the exception of the last two result in an index scan.
For completeness' sake, I have to add that the parameter track_counts
has to remain “on” for the query to work, otherwise index usage is not tracked in pg_stat_user_indexes
. But you must not change that parameter anyway, otherwise autovacuum will stop working.
To find the indexes that have never been used since the last statistics reset with pg_stat_reset()
, use
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
SELECT s.schemaname, s.relname AS tablename, s.indexrelname AS indexname, pg_relation_size(s.indexrelid) AS index_size FROM pg_catalog.pg_stat_user_indexes s JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid WHERE s.idx_scan = 0 -- has never been scanned AND 0 <>ALL (i.indkey) -- no index column is an expression AND NOT i.indisunique -- is not a UNIQUE index AND NOT EXISTS -- does not enforce a constraint (SELECT 1 FROM pg_catalog.pg_constraint c WHERE c.conindid = s.indexrelid) AND NOT EXISTS -- is not an index partition (SELECT 1 FROM pg_catalog.pg_inherits AS inh WHERE inh.inhrelid = s.indexrelid) ORDER BY pg_relation_size(s.indexrelid) DESC; |
Some remarks:
s.idx_scan = 0
in the query with a different condition, e.g. s.idx_scan < 10
. Indexes that are very rarely used are also good candidates for removal.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Sure that this works with partial indexes?
Yes, this also works for partial indexes.
They show up in pg_stat_user_indexes just like all other indexes.
How can I use the output from this query to also drop the indexes?
You can use psql's gexec feature and modify the SELECT list as follows:
SELECT format('DROP INDEX %I./I', s.schemaname, s.indexrelname)
FROM ... gexec
gexec executes the query result as SQL commands.
Why "AND 0 ALL (i.indkey)", functionnal indexes are not listed ?
Thx
pg_index.indkey
contains the list of table columns on which the index is defined.If an index column is an expressen, the corresponding entry is 0, and the expression is stored in
pg_index.indexprs
.The condition in my query means “no entry in
indkey
is 0”.ok, but why exclude non "simple" indexes ?
As I explained in the article, they may serve a purpose even if they aren't scanned.
PostgreSQL collects statistics for the indexed expression, which can result in better execution plans.
ok, thx !
Is there a way to drop all indexes on a particular table? without hardcoding all the index names
With
psql
, you can useSELECT format('DROP INDEX %I;', indexrelid) FROM pg_index WHERE indrelid = 'mytab'::regclass gexec
That will work for all indexes that don't back a constraint — for these it will fail.
I have a table that has 5 indexes on it. I truncate that table and insert a massive amount of data in that table. before i do the insertion, i would like to drop all indexes on that table. The query you provided, i guess i am not really understanding it correctly. I don't see where i pass my table name in it. Indrelid seems to be a numeric field and not a text. def. missing some code here
Substitute your table name instead of
'mytab'
in my code sample.https://uploads.disquscdn.com/images/ffaf157b24574f074488b7e1be76f1b0bda32ae45225bee9467d549b0b5bcfd4.png
I am as new as anyone can be with postgreSQL. first time using Postgresql. Is there anything i am missing with this? Do i need some additional plugin installation?
This is not showing indexes created on part of json object.
Such indexes are expression indexes. I explained in point 7 of "The many uses of indexes" that such indexes serve another purpose too: they make PostgreSQL collect statistics on the indexed expression. So there is a chance that dropping the index might cause worse query estimates. My query only returns indexes that are guaranteed to be useless.
Hi Laurenz, If we get list of tables from this query, would you suggest removing indexes from these tables ? How will it impact the performance ?
Yes, I recommend removing the indexes my query lists.
Performance (of data modifications) will be better afterwards.
Review the caveats in my article (run it on the production database, ...).
sure. will do. Also with 200 concurrent users, postgresSQL show 99% CPU usage. Any idea what could be the reason for that ?
This is quite unrelated and would require investigation by a consultant.
Hi Laurenz, this showed me a couple of indexes as unused, but it doesn't make sense because:
1- Without those indexes some of our regularly used queries would take hours to complete
2- EXPLAIN on queries with parameters matching customer usage, show the index being unused, funny thing after running EXPLAIN the idx_scan starts to go > 0
What could be the cause of this ?
What you report doesn't make sense to me, sorry.
Running
EXPLAIN
doesn't change anything, so there is no explanation for that behavior.I can only assume some trivial mistake, like running the query against a test database.
Okay so i ran EXPLAIN ANALYZE sorry i was misleading here which explains the stats incrementing each run, but not sure why they were 0 to begin with, do they get reset regularly these stats ?
EXPLAIN (ANALYZE)
also doesn't change the table statistics, so that won't change execution plans.VACUUM (ANALYZE)
would do that.The statistics that my query relies on get reset with the function
pg_stat_reset()
. You shouldn't call that too often.no one has ever called pg_stat_reset()
Running EXPLAIN ANALYZE on a query that uses an index 100% causes the index usage stats to be incremented, i've confirmed this multiple times now.
Ah, sorry, I was thinking in the wrong direction. Since
EXPLAIN (ANALYZE)
executes the query, it will of course increase the index scan count. To account for cases like this, you may want to useWHERE s.idx_scan < 10
rather thanWHERE s.idx_scan = 0
in my query.This does not work for index on partitioned tables.
SQL Error [2BP01]: ERROR: cannot drop index [partition_index] because index [main_table_index] requires it
Indice : You can drop index [main_table_index] instead.
How can I get the list of all index that I can delete on main tables ?
Ah, that's something I forgot.
It will be difficult to find all partitioned indexes that can be dropped, because you'd have to check if all index partitions are unused. I have modified my query to exclude index partitions.
I tried to write the query : look for all partitioned index having(max(child.idx_scan<=0)) ... My brain hurts ! 🙂
I'll try again later
I wrote this query to get all index of tables and summary of index of partitionned tables :
select * from (
-- Index of partitionned tables
select
'partitioned index' as indextype,
nsp.nspname as schemaname,
table_class.relname as tablename,
parent_class.relname as indexname,
index_columns.idx_columns as idx_columns,
seek_childs.nb_child_index,
seek_childs.nb_scans
from pg_class parent_class
join pg_index parent_index on parent_index.indexrelid = parent_class.oid
join pg_namespace nsp on nsp.oid = parent_class.relnamespace -- to get schemaname
join pg_class table_class on table_class.oid = parent_index.indrelid
, lateral (
select count(stats_child.idx_scan) as nb_child_index, sum(stats_child.idx_scan) as nb_scans
from pg_catalog.pg_stat_user_indexes stats_child
join pg_inherits pi on pi.inhrelid = stats_child.indexrelid
where pi.inhparent = parent_class.oid
) seek_childs
, LATERAL (
SELECT string_agg(attname, ', ' order by attnum) AS idx_columns
FROM pg_attribute
WHERE attrelid = parent_class.oid
) index_columns
where parent_class.relkind = 'I'
AND 0 ALL (parent_index.indkey) -- no index column is an expression
AND NOT parent_index.indisunique -- is not a UNIQUE index
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint cc WHERE cc.conindid = parent_index.indexrelid)
and table_class.relname not like '%template'
union
-- Index of regular tables
select
'regular index' as indextype,
stats_child.schemaname,
stats_child.relname AS tablename,
c.relname as indexname,
index_columns.idx_columns as idx_columns,
null as nb_child_index,
stats_child.idx_scan as id_scan_count
from pg_class c
join pg_index idx_parent on idx_parent.indexrelid = c.oid
join pg_catalog.pg_stat_user_indexes stats_child on c.oid = stats_child.indexrelid
, LATERAL (
SELECT string_agg(attname, ', ' order by attnum) AS idx_columns
FROM pg_attribute
WHERE attrelid = c.oid
) index_columns
where c.relkind = 'i'
AND 0 ALL (idx_parent.indkey) -- no index column is an expression
AND NOT idx_parent.indisunique -- is not a UNIQUE index
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint cc
WHERE cc.conindid = idx_parent.indexrelid)
AND NOT EXISTS -- is not a child index
(SELECT 1 FROM pg_inherits pi
where pi.inhrelid = c.oid)
and stats_child.relname not like '%template'
) all_index;
One of my client have a table with 25 Indexes. I want to find out which is not being used last 1 month and need to recommend for dropping the same. Is there any script available to find the index last used date
No, there isn't. Get the use count and remember it, then look at the use count again one month from now and compare the numbers.