CYBERTEC PostgreSQL Logo

Get rid of your unused indexes!

04.2018 / Category: / Tags:
Another way to get rid of unused indexes...
 © Laurenz Albe 2018

Why should I get rid of unused indexes?

Everybody knows that a database index is a good thing because it can speed up SQL queries. But this does not come for free.

The disadvantages of indexes are:

  • Indexes use up space. It is not unusual for database indexes to use as much storage space as the data themselves. And the kind of reliable, fast storage you want for a database is not necessarily cheap.
    The space used up by indexes also increases the size and duration of physical backups.
  • Indexes slow down data modification. Whenever you INSERT into or DELETE from a table, all indexes have to be modified, in addition to the table itself (the “heap”).
    And it is much more expensive to modify the complicated data structure of an index than the heap itself, which has its name precisely because it is basically an unordered “pile” of data (and as everybody knows, maintaining order is more work than having a mess). Modifying an indexed table can easily be an order of magnitude more expensive than modifying an unindexed table.
  • Indexes prevent HOT updates. Because of the architecture of PostgreSQL, every UPDATE causes a new row version (“tuple”) to be written, and that causes a new entry in every index on the table.
    This behavior has been dubbed “write amplification” and has drawn a lot of fire. This undesirable effect can be avoided if a) the new tuple fits into the same table block as the old one and b) no indexed column is modified. Then PostgreSQL creates the new tuple as a “Heap Only Tuple” (hence HOT), which is much more efficient and also reduces the work VACUUM has to do.

The many uses of indexes

Now we know that we don't want unnecessary indexes. The problem is that indexes serve so many purposes that it is difficult to determine if a certain index is needed or not.

Here is a list of all benefits of indexes in PostgreSQL:

  1. Indexes can speed up queries that use indexed columns (or expressions) in the WHERE clause.
    Everybody knows that one!
    The traditional B-tree index supports the <, <=, =, >= and > operators, while the many other index types in PostgreSQL can support more exotic operators like “overlaps” (for ranges or geometries), “distance” (for words) or regular expression matches.
  2. B-tree indexes can speed up the max() and min() aggregates.
  3. B-tree indexes can speed up ORDER BY clauses.
  4. Indexes can speed up joins. This depends on the “join strategy” chosen by the optimizer: hash joins, for example, will never make use of an index.
  5. A B-tree index on the origin of a FOREIGN KEY constraint avoids a sequential scan when rows are deleted (or keys modified) in the target table. A scan on the origin of the constraint is necessary to make sure that the constraint will not be violated by the modification.
  6. Indexes are used to enforce constraints. Unique B-tree indexes are used to enforce PRIMARY KEY and UNIQUE constraints, while exclusion constraints use GiST indexes.
  7. Indexes can provide the optimizer with better value distribution statistics.
    If you create an index on an expression, ANALYZE and the autoanalyze daemon will not only collect statistics for the data distribution in table columns, but also for each expression that occurs in an index. This helps the optimizer to get a good estimate for the “selectivity” of complicated conditions that contain the indexed expression, which causes better plans to be chosen. This is a widely ignored benefit of indexes!

Find the unused indexes!

The following query that we at CYBERTEC use will show you all indexes that serve none of the above mentioned purposes.

It makes use of the fact that all uses of indexes in the above list with the exception of the last two result in an index scan.

For completeness' sake, I have to add that the parameter track_counts has to remain “on” for the query to work, otherwise index usage is not tracked in pg_stat_user_indexes. But you must not change that parameter anyway, otherwise autovacuum will stop working.

To find the indexes that have never been used since the last statistics reset with pg_stat_reset(), use

Some remarks:

  • Don't do that on your test database, but on the production database!
  • If your software is running at several customer sites, run the query on all of them.
    Different users have different ways to use a software, which can cause different indexes to be used.
  • You can replace s.idx_scan = 0 in the query with a different condition, e.g. s.idx_scan < 10. Indexes that are very rarely used are also good candidates for removal.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

32 responses to “Get rid of your unused indexes!”

    • Yes, this also works for partial indexes.
      They show up in pg_stat_user_indexes just like all other indexes.

    • You can use psql's gexec feature and modify the SELECT list as follows:

      SELECT format('DROP INDEX %I./I', s.schemaname, s.indexrelname)
      FROM ... gexec

      gexec executes the query result as SQL commands.

    • pg_index.indkey contains the list of table columns on which the index is defined.
      If an index column is an expressen, the corresponding entry is 0, and the expression is stored in pg_index.indexprs.
      The condition in my query means “no entry in indkey is 0”.

    • With psql, you can use

      SELECT format('DROP INDEX %I;', indexrelid) FROM pg_index WHERE indrelid = 'mytab'::regclass gexec

      That will work for all indexes that don't back a constraint — for these it will fail.

    • Such indexes are expression indexes. I explained in point 7 of "The many uses of indexes" that such indexes serve another purpose too: they make PostgreSQL collect statistics on the indexed expression. So there is a chance that dropping the index might cause worse query estimates. My query only returns indexes that are guaranteed to be useless.

  1. Hi Laurenz, If we get list of tables from this query, would you suggest removing indexes from these tables ? How will it impact the performance ?

    • Yes, I recommend removing the indexes my query lists.
      Performance (of data modifications) will be better afterwards.
      Review the caveats in my article (run it on the production database, ...).

  2. Hi Laurenz, this showed me a couple of indexes as unused, but it doesn't make sense because:
    1- Without those indexes some of our regularly used queries would take hours to complete
    2- EXPLAIN on queries with parameters matching customer usage, show the index being unused, funny thing after running EXPLAIN the idx_scan starts to go > 0

    What could be the cause of this ?

    • What you report doesn't make sense to me, sorry.
      Running EXPLAIN doesn't change anything, so there is no explanation for that behavior.
      I can only assume some trivial mistake, like running the query against a test database.

      • Okay so i ran EXPLAIN ANALYZE sorry i was misleading here which explains the stats incrementing each run, but not sure why they were 0 to begin with, do they get reset regularly these stats ?

        • EXPLAIN (ANALYZE) also doesn't change the table statistics, so that won't change execution plans.
          VACUUM (ANALYZE) would do that.
          The statistics that my query relies on get reset with the function pg_stat_reset(). You shouldn't call that too often.

          • no one has ever called pg_stat_reset()
            Running EXPLAIN ANALYZE on a query that uses an index 100% causes the index usage stats to be incremented, i've confirmed this multiple times now.

          • Ah, sorry, I was thinking in the wrong direction. Since EXPLAIN (ANALYZE) executes the query, it will of course increase the index scan count. To account for cases like this, you may want to use WHERE s.idx_scan < 10 rather than WHERE s.idx_scan = 0 in my query.

  3. This does not work for index on partitioned tables.
    SQL Error [2BP01]: ERROR: cannot drop index [partition_index] because index [main_table_index] requires it
    Indice : You can drop index [main_table_index] instead.

    How can I get the list of all index that I can delete on main tables ?

    • Ah, that's something I forgot.
      It will be difficult to find all partitioned indexes that can be dropped, because you'd have to check if all index partitions are unused. I have modified my query to exclude index partitions.

      • I tried to write the query : look for all partitioned index having(max(child.idx_scan<=0)) ... My brain hurts ! 🙂
        I'll try again later

      • I wrote this query to get all index of tables and summary of index of partitionned tables :

        select * from (
        -- Index of partitionned tables
        select
        'partitioned index' as indextype,
        nsp.nspname as schemaname,
        table_class.relname as tablename,
        parent_class.relname as indexname,
        index_columns.idx_columns as idx_columns,
        seek_childs.nb_child_index,
        seek_childs.nb_scans
        from pg_class parent_class
        join pg_index parent_index on parent_index.indexrelid = parent_class.oid
        join pg_namespace nsp on nsp.oid = parent_class.relnamespace -- to get schemaname
        join pg_class table_class on table_class.oid = parent_index.indrelid
        , lateral (
        select count(stats_child.idx_scan) as nb_child_index, sum(stats_child.idx_scan) as nb_scans
        from pg_catalog.pg_stat_user_indexes stats_child
        join pg_inherits pi on pi.inhrelid = stats_child.indexrelid
        where pi.inhparent = parent_class.oid
        ) seek_childs
        , LATERAL (
        SELECT string_agg(attname, ', ' order by attnum) AS idx_columns
        FROM pg_attribute
        WHERE attrelid = parent_class.oid
        ) index_columns
        where parent_class.relkind = 'I'
        AND 0 ALL (parent_index.indkey) -- no index column is an expression
        AND NOT parent_index.indisunique -- is not a UNIQUE index
        AND NOT EXISTS -- does not enforce a constraint
        (SELECT 1 FROM pg_catalog.pg_constraint cc WHERE cc.conindid = parent_index.indexrelid)
        and table_class.relname not like '%template'
        union
        -- Index of regular tables
        select
        'regular index' as indextype,
        stats_child.schemaname,
        stats_child.relname AS tablename,
        c.relname as indexname,
        index_columns.idx_columns as idx_columns,
        null as nb_child_index,
        stats_child.idx_scan as id_scan_count
        from pg_class c
        join pg_index idx_parent on idx_parent.indexrelid = c.oid
        join pg_catalog.pg_stat_user_indexes stats_child on c.oid = stats_child.indexrelid
        , LATERAL (
        SELECT string_agg(attname, ', ' order by attnum) AS idx_columns
        FROM pg_attribute
        WHERE attrelid = c.oid
        ) index_columns
        where c.relkind = 'i'
        AND 0 ALL (idx_parent.indkey) -- no index column is an expression
        AND NOT idx_parent.indisunique -- is not a UNIQUE index
        AND NOT EXISTS -- does not enforce a constraint
        (SELECT 1 FROM pg_catalog.pg_constraint cc
        WHERE cc.conindid = idx_parent.indexrelid)
        AND NOT EXISTS -- is not a child index
        (SELECT 1 FROM pg_inherits pi
        where pi.inhrelid = c.oid)
        and stats_child.relname not like '%template'
        ) all_index;

  4. One of my client have a table with 25 Indexes. I want to find out which is not being used last 1 month and need to recommend for dropping the same. Is there any script available to find the index last used date

    • No, there isn't. Get the use count and remember it, then look at the use count again one month from now and compare the numbers.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram