Detecting unstable execution times in PostgreSQL

12.2016 / Category: How To / Tags: query tuning | sql help

Do you have queries with unstable execution times? You have a query which is sometimes slow and sometimes just lightning fast. The question is usually: Why? Maybe this little post can be inspiring and shade some light one some of your performance issues, which might bug you in your daily life.

Table of Contents

Reasons for unstable execution times

There are many reasons why the very same type of query might cause such “unpredictable” behavior. The following list shows some of the most common reasons:

Unstable execution plans
Excessive buffer usage
Locking-related troubles

Of course, there are some other possible reasons (other processes exhausting CPU/IO, checkpoints, etc.), however, given the possible scope of a post, I would like to focus on excessive buffer usage and its consequences.

Setting up an example

To demonstrate what can happen, I have compiled a simple example. All it does, is to load 50 million rows into a table. The main point is that the data will be stored on disk in random order. Then an index is created:

test=# CREATE TABLE t_test AS SELECT *
                        FROM            generate_series(1, 50000000) AS id
                        ORDER BY    random();
SELECT 50000000

test=# CREATE INDEX idx_id ON t_test (id);
CREATE INDEX

test=# CREATE TABLE t_test AS SELECT *

FROM generate_series(1, 50000000) AS id

ORDER BY random();

SELECT 50000000

test=# CREATE INDEX idx_id ON t_test (id);

CREATE INDEX

After running VACUUM the following query shows, what this is all about:

test=# explain (buffers true, timing true, analyze true, costs true)
                        SELECT         *
                        FROM           t_test
                        WHERE          id < 10000;
 
                                     QUERY PLAN      
--------------------------------------------------------------------------------
 Bitmap Heap Scan on t_test  (cost=182.72..31985.07 rows=9568 width=4)
            (actual time=10.298..102.130 rows=9999 loops=1)
            Recheck Cond: (id < 10000)    
            Heap Blocks: exact=9775    
            Buffers: shared hit=1078 read=8728    
    ->  Bitmap Index Scan on idx_id  (cost=0.00..180.32 rows=9568 width=0)
            (actual time=4.945..4.945 rows=9999 loops=1)
         Index Cond: (id < 10000)
         Buffers: shared hit=6 read=25
 Planning time: 0.162 ms
 Execution time: 103.472 ms
(9 rows)

test=# explain (buffers true, timing true, analyze true, costs true)

SELECT *

FROM t_test

WHERE id < 10000;

QUERY PLAN

--------------------------------------------------------------------------------

Bitmap Heap Scan on t_test (cost=182.72..31985.07 rows=9568 width=4)

(actual time=10.298..102.130 rows=9999 loops=1)

Recheck Cond: (id < 10000)

Heap Blocks: exact=9775

Buffers: shared hit=1078 read=8728

-> Bitmap Index Scan on idx_id (cost=0.00..180.32 rows=9568 width=0)

(actual time=4.945..4.945 rows=9999 loops=1)

Index Cond: (id < 10000)

Buffers: shared hit=6 read=25

Planning time: 0.162 ms

Execution time: 103.472 ms

(9 rows)

In this query almost 10.000 buffers are used. The trouble now is: If the data is coming from RAM (shared_buffers or OS filesystem cache) everything is somewhat fine, and execution times will be within an acceptable range. Even running the query a second time will speed up things considerably:

test=# explain (buffers true, timing true, analyze true, costs true)
                        SELECT         *
                        FROM            t_test
                        WHERE          id < 10000;
                                 QUERY PLAN    
--------------------------------------------------------------------------------
 Bitmap Heap Scan on t_test  (cost=182.72..31985.07 rows=9568 width=4)
            (actual time=8.673..25.573 rows=9999 loops=1)
            Recheck Cond: (id < 10000)            
            Heap Blocks: exact=9775             
            Buffers: shared hit=9806    
   ->  Bitmap Index Scan on idx_id  (cost=0.00..180.32 rows=9568 width=0)
            (actual time=4.037..4.037 rows=9999 loops=1)
            Index Cond: (id < 10000)
            Buffers: shared hit=31
 Planning time: 0.162 ms
 Execution time: 26.604 ms
 
(9 rows)

test=# explain (buffers true, timing true, analyze true, costs true)

SELECT *

FROM t_test

WHERE id < 10000;

QUERY PLAN

--------------------------------------------------------------------------------

Bitmap Heap Scan on t_test (cost=182.72..31985.07 rows=9568 width=4)

(actual time=8.673..25.573 rows=9999 loops=1)

Recheck Cond: (id < 10000)

Heap Blocks: exact=9775

Buffers: shared hit=9806

-> Bitmap Index Scan on idx_id (cost=0.00..180.32 rows=9568 width=0)

(actual time=4.037..4.037 rows=9999 loops=1)

Index Cond: (id < 10000)

Buffers: shared hit=31

Planning time: 0.162 ms

Execution time: 26.604 ms

(9 rows)

We are down to 26 ms.

Random disk I/O

Now let us assume for a second that there are no cache hits. Our system contains an old disk from uncle Sam's hardware store. What would happen to the execution time of the query?

Let us do some math (assuming disk seek time to be 12 ms):

9775 blocks * 12 ms = 117.300 milliseconds

In case of an old mechanical disk, the query would inevitably take around 2 minutes to complete. 2 minutes vs. 26ms – that is actually quite a significant difference.

The question is: Which conclusions can we draw? The first conclusion is that you should try to write your queries in a way that you don't need so many buffers in the first place. The second thing you can do is to try to keep data close together using index organized (a.k.a. column ordered) tables.

In PostgreSQL the CLUSTER command allows you to achieve exactly that:

test=# CLUSTER t_test USING idx_id;
CLUSTER

1 2	test=# CLUSTER t_test USING idx_id; CLUSTER

Keep in mind though that the clustering effect will reduce over time as rows get changed (thus you would normally want a Cron job there) but running the very same query using an index organized table will be A LOT faster:

test=# explain (buffers true, timing true, analyze true, costs true)
SELECT        *
FROM          t_test
WHERE         id < 10000;
                                  QUERY PLAN
--------------------------------------------------------------------------------
 Bitmap Heap Scan on t_test  (cost=182.72..31985.07 rows=9568 width=4)
     (actual time=2.021..4.961 rows=9999 loops=1)
     Recheck Cond: (id < 10000)
     Heap Blocks: exact=45
     Buffers: shared hit=5 read=71
     ->  Bitmap Index Scan on idx_id  (cost=0.00..180.32 rows=9568 width=0)
         (actual time=1.977..1.977 rows=9999 loops=1)
         Index Cond: (id < 10000)
         Buffers: shared hit=4 read=27
Planning time: 0.349 ms
Execution time: 6.294 ms
(9 rows)

test=# explain (buffers true, timing true, analyze true, costs true)

SELECT *

FROM t_test

WHERE id < 10000;

QUERY PLAN

--------------------------------------------------------------------------------

Bitmap Heap Scan on t_test (cost=182.72..31985.07 rows=9568 width=4)

(actual time=2.021..4.961 rows=9999 loops=1)

Recheck Cond: (id < 10000)

Heap Blocks: exact=45

Buffers: shared hit=5 read=71

-> Bitmap Index Scan on idx_id (cost=0.00..180.32 rows=9568 width=0)

(actual time=1.977..1.977 rows=9999 loops=1)

Index Cond: (id < 10000)

Buffers: shared hit=4 read=27

Planning time: 0.349 ms

Execution time: 6.294 ms

(9 rows)

You will also see that the number of buffers used has been greatly reduced, which makes the execution time of the query better – even if the cache hit rates tend to be close to zero.

In case you need any assistance, please feel free to contact us.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

2 responses to “Detecting unstable execution times in PostgreSQL”

Högvälborne Efraim Pissmyra says:

December 21, 2016 at 5:22 pm

Don't forget to disable transparent hugepages! We had very unstable performance on our production databases for a long time. The query runtimes were completely unpredictable. Usually fast, sometimes pausing for up to a second for no apparent reason, even on trivial single PK lookup queries. Disabling transparent hugepages made all the difference.

echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled

Reply
- Hans-Jürgen Schönig says:
  
  December 21, 2016 at 6:49 pm
  
  that is certainly an issue too
  
  Reply

Detecting unstable execution times in PostgreSQL

Reasons for unstable execution times

Setting up an example

Random disk I/O

2 responses to “Detecting unstable execution times in PostgreSQL”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts