CYBERTEC PostgreSQL Logo

PostgreSQL & tablespaces - it's not so scary ...

05.2017 / Category: / Tags: |

Having spent the last week with Oracle DBAs eagerly peeking towards PostgreSQL, I saw that besides the usual beginner questions one of the Postgres concepts that sounded kind of familiar for Oracle folks (but still brought some confusion, as it's not quite the same thing) are tablespaces.

From higher (user) level everything is pretty similar to Oracle and other systems though – tablespaces "house" database object like tables and indexes. The biggest differences compared to Oracle might be that Postgres tablespaces are cluster level objects and there's no possibility to restrict tablespace (thus cluster/database) size. You don't have to deal with single datafiles but folders instead.

And before going to the details - why would one need tablespaces at all? Well, for two reasons mostly – first to keep the database running when the disk space is running out on the current partition and there's no easy way to extend it (no LVM for example) ... and second, to optimize performance and enable more parallel IO, by for example having indexes on fast SSD disks to benefit OLTP workloads. Just "for fun" I wouldn't recommend setting up tablespaces as it has some implications for replication scenarios, see end of article for more.

So what is exactly a tablespace in Postgres?

As a reminder – by default a Postgres cluster is "self-contained", meaning all the data files will be residing inside the initialized "data directory" (called also PGDATA as one can set the similarly named environment variable to simplify working with clusters via pg_ctl), making things very easy and portable.

But about tablespaces - actually it's very simple, a tablespace is just another name for a folder where the Postgres server process owner (typically "postgres" system user) is the owner and can write some files. The folder can be located anywhere, Postgres must just be able to create symlinks to that certain path. The symlinks are created in the "pg_tblspc" subfolder of the cluster (for example /var/lib/postgresql/9.6/main/pg_tblspc on Ubuntu) and can be inspected using the usual file system commands like "ls -l". After that, users can explicitly say that they want to store some specific tables/indexes in that new "linked folder" or one can also decide to make it the default for new objects or even move all old objects over to the new tablespace.

How to create one?

In the simplest form something like that will work:

When using tablespaces to tune IO though, something like that would make more sense:

NB! Just declaring a tablespace won't help normal unprivileged users – they all need a CREATE grant to create tables/indexes. Something like that will alleviate the situation. Note that for databases with high security requirements one should of course give out such grants on a per user basis.

Moving existing tables to a new tablespace

When plagued by disks running out of space, then after creating a new tablespace the first thing should be setting it also to serve as a default for new objects. This can be done by modifying the default_tablespace parameter and reloading the server config via pg_reload_conf() or "pg_ctl -D mydatadir reload". After that one could/should move also fast growing tables/indexes to the new tablespace. This can be done per table or starting with Postgres 9.4 also en masse.

NB! Using the later syntax locks all tables/indexes (use NOWAIT for fast failing when unable to get all the locks) and will then start to copy the data files one by one, so basically a downtime is required. For "online" moving one can though use tools like pg_repack or pg_squeeze.

Backup & replication considerations

For logical (pg_dump, pg_dumpall) backups nothing changes, but for physical backups (or when building up a replica), pg_basebackup will by default require the exact same directory layout on the replica for things to work. When this is not possible or feasible then one can "re-map" the tablespaces though using the --tablespace-mapping parameter or just use the "tar" format that creates again a "self-contained" blob. See pg_basebackup for more details. Also for doing manual basebackups with cp or rsync or such, one should make sure the "dereference" option is used so that the datafiles behind the symlinks get copied.

Hope it helped. Questions/corrections are welcome as usual! For more information - subscribe to our newsletter.

10 responses to “PostgreSQL & tablespaces - it's not so scary ...”

  1. > for example having indexes on fast SSD disks

    Wouldn't you be better off w/ the data files on SSD and the indexes on spinning rust? B-TREE access is pretty fast and optimal as is but the random I/O of gathering data pages can be horrific.

    • Sure this would be the best option if one has enough SSD space 🙂 ...but if you're constrained and have a big and wide table with dozens of columns say, then placing the heavily used ID column on SSD might still be a better balance, winning say 20x in SSD space, giving away 10x in speed but having hope for very fast index-only scans for counts and such. Very use case dependent stuff anyways...

  2. Hello, thank you for the clear article.

    I set up 2 table spaces where I believed PG would have dropped most of the data, but the default tablespace now keeps on growing more than the others.

    All my tables and indexes are now going into one of the 2 custom tablespaces so I'm puzzled by this growing of the default disk location.

    I'm doing this with the goal of splitting different data-areas of my application into different EBS disks on AWS.

  3. Sorry, I'm starting with postgresql now. Why can not I create two tablespace in postgresql in the same directory? ERROR: directory "/var/lib/postgresql/10/main/base/PG_10_201707211" already in use as a tablespace

      • Thank you Pavlo. As I understand it, tablespace is a physical directory. I first created the directories on the disk and then created the tablespaces by stating the respective location for each tablespace.

  4. my windows is corrupted.so i need to re-install it. my database in table space in my usb folder. postgres installation folder and data folder not recovered can i got the data restoration from my table space folder

  5. So I want to store indexes in local SSD for performance and main data in a redundant network drive for safe keeping.

    What if the local SSD breaks, would the whole database end up in corrupted state or can postgres recreate the indexes from the main data?

    How about materialized views?

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram