Table of Contents
Users with an Oracle background consider tablespaces very important and are surprised that you can find so little information about them in PostgreSQL. This article will explain what they are, when they are useful and whether or not you should use them.
Essentially, a tablespace in PostgreSQL is a directory containing data files. These data files are the storage behind objects with a state: tables, sequences, indexes and materialized views. In PostgreSQL, each such object has its own data file. If the object is bigger, it will have several files called segments with a size limit of 1GB.
PostgreSQL uses the operating system's file system for its storage. This is different from Oracle, which essentially implements its own “file system”.
Let's compare the terms for clarity:
Oracle | PostgreSQL or operating system |
---|---|
tablespace | file system |
datafile | logical/physical volume |
segment | all data files of a table |
extent | segment / data file |
Each PostgreSQL database cluster initially has two tablespaces. You can list them with db
in psql
:
1 2 3 4 5 6 |
List of tablespaces Name | Owner | Location ------------+----------+---------- pg_default | postgres | pg_global | postgres | (2 rows) |
You'll notice that there is no location specified. That is because they always correspond to fixed subdirectories of the PostgreSQL data directory: the default tablespace (pg_default
) is the “base
” subdirectory, and the global tablespace (pg_global
) is the “global
” subdirectory.
By default, all data files will be stored in the default tablespace. Only certain objects are stored in the global tablespace: the catalog tables pg_database
, pg_authid
, pg_tablespace
and pg_shdepend
and all their indexes. These are the only catalog tables shared by all databases.
To create a new tablespace, you first have to create a new directory. Don't create that directory in the PostgreSQL data directory!
Note that the directory has to belong to the “postgres
” operating system user (to be exact, the user has to have permissions to change the directory's permissions).
Then you can create the tablespace:
1 |
CREATE TABLESPACE mytbsp LOCATION '/tmp/mytbsp'; |
To use the tablespace, you can create a table or another object with storage in it:
1 2 3 4 5 6 7 8 9 10 11 |
CREATE TABLE newtab ( id integer NOT NULL, val text NOT NULL ) TABLESPACE mytbsp; ALTER TABLE newtab ADD CONSTRAINT newtab_pkey PRIMARY KEY (id) USING INDEX TABLESPACE mytbsp; CREATE INDEX newtab_val_idx ON newtab (val) TABLESPACE mytbsp; |
Note that indexes are not automatically created in the same tablespace as the table.
You can also create a database in a tablespace:
1 |
CREATE DATABASE newdb TABLESPACE mytbsp; |
Then all objects you create in that database will automatically be placed in the database's tablespace.
There are ALTER
commands to change the tablespace of any object. Moving an object to another tablespace copies the data files, and the object is inaccessible while it is being moved.
If you perform a file system backup of a database with tablespaces, you have to back up all tablespaces. You cannot back up or restore a single tablespace, and there is no equivalent to Oracle's “transportable tablespaces”.
pg_basebackup
with the plain format will try to save tablespaces in the same place as on the database server (the -D
option only specifies the location of the data directory). To backup data from a tablespace to a different location, you have to use the option --tablespace-mapping=olddir=newdir
. You can use this option more than once for multiple tablespaces.
Using tablespaces makes database administration more complicated, because the data directory no longer contains all the data.
In the vast majority of cases, you shouldn't create extra tablespaces in PostgreSQL. In particular, it never makes sense to create a tablespace on the same file system as the data directory or on the same file system as another tablespace.
So what are the benefits of tablespaces that justify the administrative complexity?
seq_page_cost
, random_page_cost
and effective_io_concurrency
options on the tablespace to tell the optimizer about the performance characteristics.temp_tablespaces
parameter to a different tablespace.If you are running in a virtualized environment with virtualized storage, all these points are moot, with the exception of the third. Since almost everybody uses virtualization these days, tablespaces are becoming an increasingly irrelevant PostgreSQL feature.
There is a tenacious myth circulating among database administrators that you should put tables and indexes on different disks for good performance.
You will hear people making elaborate arguments as to why the particular interplay of access patterns during an index scan will make this efficient on spinning disks. But spinning disks are going out of business, and you typically only saturate your storage system with several concurrent SQL statements, when all such access patterns will be disrupted anyway.
The truth behind the myth is that it is certainly beneficial to spread the I/O load over multiple devices. If you use striping on the operating system level, you will get a better spread than you will by carefully placing tables and indexes.
Tablespaces are rarely relevant in PostgreSQL. Resist the temptation to create tablespaces and leave all data in the default tablespace.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
would be really nice to have transportable tablespaces... that could be plugged in/out different Postgres clusters running on different versions. maybe some day 🙂
Would the one tablespace rule still stay true if you have a database where a handful of the tables would be very large ( i.e. terabytes in size ) but the majority being small ( i.e. less than 1 gb ) ? Coming from an Oracle environment we split the large tables and the it's indexes into separate tablespaces for growth and to monitor more closely. Being a Postgres newbie how do I do that in this new world ?
You can monitor the size of a table with
pg_table_size()
.I don't see much benefit in splitting up a database like this, be it Oracle or PostgreSQL.
I think it just causes unnecessary maintenance effort. On the other hand, if you don't mind that, there is no harm done.