CYBERTEC PostgreSQL Logo

When to use tablespaces in PostgreSQL

03.2021 / Category: / Tags:
PostgreSQL tablespaces misunderstood
© Laurenz Albe 2021

 

Users with an Oracle background consider tablespaces very important and are surprised that you can find so little information about them in PostgreSQL. This article will explain what they are, when they are useful and whether or not you should use them.

What is a tablespace

Essentially, a tablespace in PostgreSQL is a directory containing data files. These data files are the storage behind objects with a state: tables, sequences, indexes and materialized views. In PostgreSQL, each such object has its own data file. If the object is bigger, it will have several files called segments with a size limit of 1GB.

PostgreSQL uses the operating system's file system for its storage. This is different from Oracle, which essentially implements its own “file system”.

Let's compare the terms for clarity:

Terminology in Oracle vs. PostgreSQL
Oracle PostgreSQL or operating system
tablespace file system
datafile logical/physical volume
segment all data files of a table
extent segment / data file

Default tablespaces

Each PostgreSQL database cluster initially has two tablespaces. You can list them with db in psql:

You'll notice that there is no location specified. That is because they always correspond to fixed subdirectories of the PostgreSQL data directory: the default tablespace (pg_default) is the “base” subdirectory, and the global tablespace (pg_global) is the “global” subdirectory.

By default, all data files will be stored in the default tablespace. Only certain objects are stored in the global tablespace: the catalog tables pg_database, pg_authid, pg_tablespace and pg_shdepend and all their indexes. These are the only catalog tables shared by all databases.

Creating and using new tablespaces

To create a new tablespace, you first have to create a new directory. Don't create that directory in the PostgreSQL data directory!

Note that the directory has to belong to the “postgres” operating system user (to be exact, the user has to have permissions to change the directory's permissions).

Then you can create the tablespace:

To use the tablespace, you can create a table or another object with storage in it:

Note that indexes are not automatically created in the same tablespace as the table.

You can also create a database in a tablespace:

Then all objects you create in that database will automatically be placed in the database's tablespace.

There are ALTER commands to change the tablespace of any object. Moving an object to another tablespace copies the data files, and the object is inaccessible while it is being moved.

Backup and tablespaces

If you perform a file system backup of a database with tablespaces, you have to back up all tablespaces. You cannot back up or restore a single tablespace, and there is no equivalent to Oracle's “transportable tablespaces”.

pg_basebackup with the plain format will try to save tablespaces in the same place as on the database server (the -D option only specifies the location of the data directory). To backup data from a tablespace to a different location, you have to use the option --tablespace-mapping=olddir=newdir. You can use this option more than once for multiple tablespaces.

Using tablespaces makes database administration more complicated, because the data directory no longer contains all the data.

When should I create a tablespace?

In the vast majority of cases, you shouldn't create extra tablespaces in PostgreSQL. In particular, it never makes sense to create a tablespace on the same file system as the data directory or on the same file system as another tablespace.

So what are the benefits of tablespaces that justify the administrative complexity?

  • If you have your data spread over several file systems on different devices, you can distribute the I/O load. However, this can be done just as well or better by striping on a lower level.
  • If you run out of space on a disk, a tablespace will give you the option to add more storage. However, today you typically use a volume manager of some description to do this on the operating system level.
  • If you want to put a size limit on a database or table, you can put it in a tablespace on a file system with limited size.
  • If you have both fast, expensive and slow, cheap storage, you can put the objects that need good performance in a separate tablespace on fast storage. In that case, you should adjust the seq_page_cost, random_page_cost and effective_io_concurrency options on the tablespace to tell the optimizer about the performance characteristics.
  • If you want temporary files (both for temporary tables and for query processing) to be created somewhere else than in the default tablespace, you can set the temp_tablespaces parameter to a different tablespace.

If you are running in a virtualized environment with virtualized storage, all these points are moot, with the exception of the third. Since almost everybody uses virtualization these days, tablespaces are becoming an increasingly irrelevant PostgreSQL feature.

Dispelling an old myth

There is a tenacious myth circulating among database administrators that you should put tables and indexes on different disks for good performance.

You will hear people making elaborate arguments as to why the particular interplay of access patterns during an index scan will make this efficient on spinning disks. But spinning disks are going out of business, and you typically only saturate your storage system with several concurrent SQL statements, when all such access patterns will be disrupted anyway.

The truth behind the myth is that it is certainly beneficial to spread the I/O load over multiple devices. If you use striping on the operating system level, you will get a better spread than you will by carefully placing tables and indexes.

Conclusion

Tablespaces are rarely relevant in PostgreSQL. Resist the temptation to create tablespaces and leave all data in the default tablespace.

3 responses to “When to use tablespaces in PostgreSQL”

  1. would be really nice to have transportable tablespaces... that could be plugged in/out different Postgres clusters running on different versions. maybe some day 🙂

  2. Would the one tablespace rule still stay true if you have a database where a handful of the tables would be very large ( i.e. terabytes in size ) but the majority being small ( i.e. less than 1 gb ) ? Coming from an Oracle environment we split the large tables and the it's indexes into separate tablespaces for growth and to monitor more closely. Being a Postgres newbie how do I do that in this new world ?

    • You can monitor the size of a table with pg_table_size().
      I don't see much benefit in splitting up a database like this, be it Oracle or PostgreSQL.
      I think it just causes unnecessary maintenance effort. On the other hand, if you don't mind that, there is no harm done.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram