Do you have queries with unstable execution times? You have a query which is sometimes slow and sometimes just lightning fast. The question is usually: Why? Maybe this little post can be inspiring and shade some light one some of your performance issues, which might bug you in your daily life.
Table of Contents
There are many reasons why the very same type of query might cause such “unpredictable” behavior. The following list shows some of the most common reasons:
Of course, there are some other possible reasons (other processes exhausting CPU/IO, checkpoints, etc.), however, given the possible scope of a post, I would like to focus on excessive buffer usage and its consequences.
To demonstrate what can happen, I have compiled a simple example. All it does, is to load 50 million rows into a table. The main point is that the data will be stored on disk in random order. Then an index is created:
1 2 3 4 5 6 7 |
test=# CREATE TABLE t_test AS SELECT * FROM generate_series(1, 50000000) AS id ORDER BY random(); SELECT 50000000 test=# CREATE INDEX idx_id ON t_test (id); CREATE INDEX |
After running VACUUM the following query shows, what this is all about:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
test=# explain (buffers true, timing true, analyze true, costs true) SELECT * FROM t_test WHERE id < 10000; QUERY PLAN -------------------------------------------------------------------------------- Bitmap Heap Scan on t_test (cost=182.72..31985.07 rows=9568 width=4) (actual time=10.298..102.130 rows=9999 loops=1) Recheck Cond: (id < 10000) Heap Blocks: exact=9775 Buffers: shared hit=1078 read=8728 -> Bitmap Index Scan on idx_id (cost=0.00..180.32 rows=9568 width=0) (actual time=4.945..4.945 rows=9999 loops=1) Index Cond: (id < 10000) Buffers: shared hit=6 read=25 Planning time: 0.162 ms Execution time: 103.472 ms (9 rows) |
In this query almost 10.000 buffers are used. The trouble now is: If the data is coming from RAM (shared_buffers or OS filesystem cache) everything is somewhat fine, and execution times will be within an acceptable range. Even running the query a second time will speed up things considerably:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
test=# explain (buffers true, timing true, analyze true, costs true) SELECT * FROM t_test WHERE id < 10000; QUERY PLAN -------------------------------------------------------------------------------- Bitmap Heap Scan on t_test (cost=182.72..31985.07 rows=9568 width=4) (actual time=8.673..25.573 rows=9999 loops=1) Recheck Cond: (id < 10000) Heap Blocks: exact=9775 Buffers: shared hit=9806 -> Bitmap Index Scan on idx_id (cost=0.00..180.32 rows=9568 width=0) (actual time=4.037..4.037 rows=9999 loops=1) Index Cond: (id < 10000) Buffers: shared hit=31 Planning time: 0.162 ms Execution time: 26.604 ms (9 rows) |
We are down to 26 ms.
Now let us assume for a second that there are no cache hits. Our system contains an old disk from uncle Sam's hardware store. What would happen to the execution time of the query?
Let us do some math (assuming disk seek time to be 12 ms):
9775 blocks * 12 ms = 117.300 milliseconds
In case of an old mechanical disk, the query would inevitably take around 2 minutes to complete. 2 minutes vs. 26ms – that is actually quite a significant difference.
The question is: Which conclusions can we draw? The first conclusion is that you should try to write your queries in a way that you don't need so many buffers in the first place. The second thing you can do is to try to keep data close together using index organized (a.k.a. column ordered) tables.
In PostgreSQL the CLUSTER command allows you to achieve exactly that:
1 2 |
test=# CLUSTER t_test USING idx_id; CLUSTER |
Keep in mind though that the clustering effect will reduce over time as rows get changed (thus you would normally want a Cron job there) but running the very same query using an index organized table will be A LOT faster:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
test=# explain (buffers true, timing true, analyze true, costs true) SELECT * FROM t_test WHERE id < 10000; QUERY PLAN -------------------------------------------------------------------------------- Bitmap Heap Scan on t_test (cost=182.72..31985.07 rows=9568 width=4) (actual time=2.021..4.961 rows=9999 loops=1) Recheck Cond: (id < 10000) Heap Blocks: exact=45 Buffers: shared hit=5 read=71 -> Bitmap Index Scan on idx_id (cost=0.00..180.32 rows=9568 width=0) (actual time=1.977..1.977 rows=9999 loops=1) Index Cond: (id < 10000) Buffers: shared hit=4 read=27 Planning time: 0.349 ms Execution time: 6.294 ms (9 rows) |
You will also see that the number of buffers used has been greatly reduced, which makes the execution time of the query better – even if the cache hit rates tend to be close to zero.
In case you need any assistance, please feel free to contact us.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Don't forget to disable transparent hugepages! We had very unstable performance on our production databases for a long time. The query runtimes were completely unpredictable. Usually fast, sometimes pausing for up to a second for no apparent reason, even on trivial single PK lookup queries. Disabling transparent hugepages made all the difference.
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
that is certainly an issue too