Just like most other database systems I am aware of PostgreSQL providing a feature called tablespaces. In general a tablespace is a collection of storage files residing in a certain mount point. The main idea of a tablespace is to give administrators a chance to scale out their I/O and to make things more efficient. Typically a tablespace makes a lot of sense if you want to access more than just one I/O subsystem.
Table of Contents
As I have mentioned before, a tablespace can be used to add more hardware to the system – but, is there some other way to make use of a tablespace? Well, there is some sort of hidden “abuse”, which can be highly beneficial to performance: Using tablespaces to help the optimizer.
In PostgreSQL the optimizer is in charge of finding good execution plans given a certain query. To come up with a good plan PostgreSQL has to calculate artificial costs for certain operations. To estimate I/O, two essential cost parameters are used:
In reality (on mechanical disks) random I/O is a lot more expensive than sequential I/O because the disk has to seek each block. This is why PostgreSQL assigns 4 penalty points to such a block.
Let us assume for a second there is no cache around. In a case like that, random I/O can very well be 50-100 times more expensive than sequential I/O. The question is: Why is the cost parameter still at 4 instead of at 100 then? The answer is that we can assume that not all blocks will come from disk. Some will be cached, and we can assume that in average 4 is a pretty good number.
But what if a table (maybe 10 TB?) is so large that we can never expect an index lookup to benefit from a cache hit? What if we had additional small lookup tables which will always be in cache because they are used frequently? Clearly, it seems unfair to use the same cost parameters for those different types of tables.
A tablespace can come to the rescue here. PostgreSQL allows people to assign different cost parameters to different tablespaces. We could pack tables we expect to be uncached into one tablespace and lookup tables onto some other tablespace. Those two tablespaces could reside on the same set of drives – we could just use a tablespace to honour different cost models.
Then we can make sure that the optimizer will consider the costs of a certain tablespace to be more expensive. Here is how it works:
1 2 |
test=# ALTER TABLESPACE name_of_tablespace SET (random_page_cost = 20); ALTER TABLESPACE |
In this case we have told the system that data residing on a tablespace called name_of_tablespace is 5 times more expensive than a typical random read.
Especially when data sets are composed of very large and fairly small tables, using different cost parameter for various tablespaces can be highly beneficial because it allows the optimizer to estimate costs more reliably. Of course, it takes some time to properly adjust those parameters, but it can be highly beneficial.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply