PostgreSQL knows a parameter called effective_cache_size. To many this is one of the most mysterious settings in the entire PostgreSQL config. Maybe this is the core reason why many people just completely ignore this setting – in many cases this is not the best idea of all. effective_cache_size can have a serious impact on execution plans and change things dramatically. So, it is definitely a good idea to take this into account and see what kind of difference it can make.
What does effective_cache_size actually do? Well, there is a pretty cool comment in the PostgreSQL optimizer explaining what this is all about:
We also use a rough estimate "effective_cache_size" of the number of disk pages in Postgres + OS-level disk cache. (We can't simply use NBuffers for this purpose because that would ignore the effects of the kernel's disk cache.) Obviously, taking constants for these values is an oversimplification, but it's tough enough to get any useful estimates even at this level of detail. Note that all of these parameters are user-settable, in case the default values are drastically off for a particular platform.
The idea is to give the planner an idea of what the operating system is doing and adjust its estimates accordingly. In PostgreSQL the default value of effective_cache_size is 128 MB – on most production system this is by far not enough.
Let us take a look at some practical implications of effective_cache_size. To do so we can create a simple table:
1 2 |
test=# CREATE TABLE t_test (id int4); CREATE TABLE |
Then we can import some data into the table. We do this in a random order to make the impact more visible:
1 2 |
test=# INSERT INTO t_test SELECT * FROM generate_series(1, 2500000) ORDER BY random(); INSERT 0 2500000 |
Let us create an index now:
1 2 |
test=# CREATE INDEX idx_in ON t_test (id); CREATE INDEX |
As I have stated before, the default value of effective_cache_size is 128 MB. We can set this to 1 MB on the fly (for our session only):
1 2 |
test=# SET effective_cache_size TO '1 MB'; SET |
To look for the lowest 10 numbers we can use the following query:
1 2 3 4 5 6 |
test=# explain SELECT * FROM t_test ORDER BY id LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------------------- Limit (cost=0.00..39.97 rows=10 width=4) -> Index Only Scan using idx_in on t_test (cost=0.00..9992553.14 rows=2500000 width=4) (2 rows) |
As you can see costs of this query are estimated at 39.97 penalty points.
What happens if we change effective_cache_size to an insanely high value?
1 2 3 4 5 6 7 8 9 |
test=# SET effective_cache_size TO '10000 MB'; SET test=# explain SELECT * FROM t_test ORDER BY id LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------------- Limit (cost=0.00..0.44 rows=10 width=4) -> Index Only Scan using idx_in on t_test (cost=0.00..109180.31 rows=2500000 width=4) (2 rows) |
As you can see the costs will drop dramatically. This makes sense because if we don't expect the kernel to cache any data if we got only 1 MB of RAM – however, we expect the cache hit rate on the kernel side to up dramatically if we can expect things to be cached by the OS. Random I/O is the most expensive thing and changing this cost parameter has serious impacts on what the planner believes. Just imagine a more complex query – different cost estimates can lead to totally different plans.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
What if increase effective_cache_size makes worsening in execution plains? Maybe you can reduce the cost but i have experience with a coup of queries whose time execution worsened without any other cause.
Since the only difference in the query plans in the example is the estimated cost I assume that the performance would actually be the same? In this instance the estimated cost has not changed the selected plan?
I don't get what the conclusion in terms of "which value to choose for xy amount of available RAM" is here. I doubt effective_cache_size = '10000 MB' is the recommendation.