Running ALTER TABLE
… ADD COLUMN
can have side effects which have the potential to cause serious issues in production. Changing data structures is an important issue and often comes up, therefore it's important to understand what is really going on.
Table of Contents
Let's dive in and see how to run ALTER TABLE
… ADD COLUMN
in a way that doesn't hurt operations due to locking or extensive I/O.
The following statement creates a simple table which is going to serve as a test dummy for the operations we plan to run:
1 2 3 4 5 |
test=# CREATE TABLE t_sample AS SELECT a1::int FROM generate_series(1, 100000000) AS a1; SELECT 100000000 Time: 56272,917 ms (00:56,273) |
The initial table is around 3.4 GB in size as shown in the next listing:
1 2 3 4 5 6 |
test=# d+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+----------+-------+-------+-------------+---------------+---------+------------- public | t_sample | table | hs | permanent | heap | 3458 MB | (1 row) |
So far this is what we expected. But what if columns are added? How does it impact the database?
We have to address various scenarios here. The simplest one is to add a column without any default values:
1 2 3 |
test=# ALTER TABLE t_sample ADD COLUMN a2 int; ALTER TABLE Time: 11,472 ms |
The important point here is: This operation is really fast because all it does is to run some magic on the system catalog. We do need a full table lock BUT it is a really short lock because PostgreSQL does not actually write this column to disk. The system column knows that one more column should be on disk - as nothing is present the engine can safely assume that the value is indeed NULL.
A similar trick can be applied if we add a constant default value:
1 2 3 |
test=# ALTER TABLE t_sample ADD COLUMN a3 int DEFAULT 10; ALTER TABLE Time: 3,339 ms |
Again the operation is really quick and it does NOT affect the amount of storage we need because this additional column does not make it to disk in the first place:
1 2 3 4 5 6 |
test=# d+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+----------+-------+-------+-------------+---------------+---------+------------- public | t_sample | table | hs | permanent | heap | 3458 MB | (1 row) |
The engine knows that more columns are supposed to be around. However, if nothing can be found inside a row, PostgreSQL will know what the value is supposed to be. In this case it can expect the value to be 10. Again, there is no need to persist this entry.
However, the situation starts to change if the value we want to add to the column is not constant anymore.
1 2 3 4 |
test=# ALTER TABLE t_sample ADD COLUMN a4 int DEFAULT random()*1000; ALTER TABLE Time: 65233,954 ms (01:05,234) |
random()
does not yield deterministic values. However, the content of a table is supposed to be static - it must not change on the fly. Therefore we really have to write the data to disk. Mind that the operation takes more than one minute. While this is not much, it also means that the table is locked for the time being. Having hundreds or possibly thousands of connections wait on this operation can seriously harm availability and cause all kinds of problems ranging from reaching connection limits to timeouts. Therefore extensive table locking has to be avoided.
Due to the fact that the additional column had to be written to disk, the size of the table has grown to over 4 GB in size.
1 2 3 4 5 6 |
test=# d+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+----------+-------+-------+-------------+---------------+---------+------------- public | t_sample | table | hs | permanent | heap | 4223 MB | (1 row) |
Note that size is not the primary problem here - locking usually is.
Once columns have been added, we also need to see what happens if we want to get rid of them. Just like ADD COLUMN
, the DROP COLUMN
command tries to avoid rewriting the table as much as possible. Therefore DROP COLUMN
simply marks the column as deleted:
1 2 3 4 5 6 7 8 9 |
test=# ALTER TABLE t_sample DROP COLUMN a3; ALTER TABLE Time: 8,148 ms test=# d+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+----------+-------+-------+-------------+---------------+---------+------------- public | t_sample | table | hs | permanent | heap | 4223 MB | (1 row) |
The size of the table doesn't change since the physical column still exists. However, we can't see it and the space will be reclaimed later on. ALTER TABLE
… DROP COLUMN
… therefore (usually) doesn't cause major locking issues, unless you're having a hard time getting a short lock due to extremely high concurrency.
Finally …
If you want to learn more about PostgreSQL, see Michał's post on practical examples of data normalization.
Check out our CYBERTEC Migrator to migrate from Oracle to PostgreSQL: meet the CYBERTEC Migrator
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply