Row store vs. column store - a lot has been written about this topic in the context of PostgreSQL and Citus. What does it really mean and what are the implications? Are column stores “always cool” and “always beneficial”? No, there's more to it - which requires a closer look. When trying to understand the benefits of column store over row store or vice versa, it's important to grasp the basic ideas which lead to the database behavior we're going to see in this blog.
Table of Contents
Let's use a practical example to understand. Consider the following:
1 |
SELECT first_name, last_name FROM person WHERE id = 10; |
What will happen behind the scenes: PostgreSQL will not just look for 3 columns (first_name
, last_name
and id
). It will instead communicate with the disk system using 8k blocks. Inside a single block, PostgreSQL stores more than just one row including visibility and so on. In a classical OLTP application (bookkeeping, financial transactions, address search, etc.) we usually need the entire row. Just imagine you are searching for an address - usually all fields are displayed and it is highly beneficial to fetch the entire information from disk in one go. That is where a row store is the best option.
Suppose we want to sum up our entire sales for the past 20 years. Do we need the name of those products we sold? Do we need the quantity and the place where we sold it? Actually, we do not. All we need is 1 out of, say, 20 columns. If we want to add up 100 billion rows it makes sense to just read information worth 1 column as opposed to the entire data. Reading 20 columns worth of data to throw 19 columns away because they are not needed for our purpose leads to a lot of I/O and adds a ton of overhead (such as finding the column we need in the row and so on). Analytics and BI are the use cases where column stores shine.
However, suppose we store every column of a table in a single data file and we want to find all columns belonging to a single address? In this case we have to read 20 data files and recompile the row in the most cumbersome way possible. This is when row stores shine.
As you can see, there is no technique that “fits all”. Depending on the use case, different storage formats are required.
We can easily compare row stores and column stores: create two identical tables in different storage formats. The important difference is simply the storage method (USING columnar
):
1 2 3 4 5 6 7 8 9 10 11 12 |
postgres=# CREATE TABLE t_row_timeseries ( id serial, tstamp timestamptz, val float ); CREATE TABLE postgres=# CREATE TABLE t_col_timeseries ( id serial, tstamp timestamptz, val float ) USING columnar; CREATE TABLE |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
postgres=# SELECT create_distributed_table( 't_row_timeseries', 'tstamp' ); create_distributed_table -------------------------- (1 row) postgres=# SELECT create_distributed_table( 't_col_timeseries', 'tstamp' ); create_distributed_table -------------------------- (1 row) |
For the purpose of this test, we will load 157 million rows. This is equal to one row per second for five years: 86400 * 365 * 5 = 157.680.000. In other words, this is already a relevant but realistic amount of data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
postgres=# INSERT INTO t_row_timeseries (tstamp, val) SELECT '2020-01-01'::timestamp + (id || ' seconds')::interval, id % 16 FROM generate_series(1, 86400*365*5) AS id; INSERT 0 157680000 Time: 104624.127 ms (01:44.624) postgres=# INSERT INTO t_col_timeseries (tstamp, val) SELECT '2020-01-01'::timestamp + (id || ' seconds')::interval, id % 16 FROM generate_series(1, 86400*365*5) AS id; INSERT 0 157680000 Time: 105505.957 ms (01:45.506) |
What is interesting here is that the time we need to load the data is pretty similar. Most of the CPU cycles are burned to generate the data in the first place. The rest is not so important, so loading data is not really an issue. We generate and load around 1.5 million rows per second on my local Mac Mini (M1) on OS (which is not the fastest OS for database work at all).
1 2 3 4 5 6 7 |
postgres=# SELECT table_name, table_size, access_method FROM citus_tables ; table_name | table_size | access_method ------------------+------------+--------------- t_col_timeseries | 1651 MB | columnar t_row_timeseries | 7849 MB | heap (2 rows) |
The column store is around 5 times smaller than the normal PostgreSQL table, which is pretty much what we expected anyway. A lot of trickery which pays off big time can be applied with this storage technique. While hard-drives have become bigger, cheaper and faster, the effect of this technological progress has been eaten away financially, by the cost of cloud storage, and technically, by virtualization and network storage. In short: It does matter that our table is smaller and more compact, for many reasons.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
postgres=# SELECT count(*) FROM t_row_timeseries; count ----------- 157680000 (1 row) Time: 3028.839 ms (00:03.029) postgres=# SELECT count(*) FROM t_col_timeseries; count ----------- 157680000 (1 row) Time: 979.112 ms |
In reality, even more performance improvement is possible, but as always, things depend a bit on the type of query you are running.
Running slightly more advanced SQL
However, there are many more options than just a simple count.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
postgres=# SELECT val, count(*) FROM t_row_timeseries GROUP BY 1 ORDER BY 1 LIMIT 3; val | count -----+--------- 0 | 9855000 1 | 9855000 2 | 9855000 (3 rows) Time: 3764.688 ms (00:03.765) postgres=# SELECT val, count(*) FROM t_col_timeseries GROUP BY 1 ORDER BY 1 LIMIT 3; val | count -----+--------- 0 | 9855000 1 | 9855000 2 | 9855000 (3 rows) Time: 2519.854 ms (00:02.520) |
The difference is not as big as before, because a lot more time is burned on CPU. I/O time is lower compared to the overall time needed by the CPU to process this information. The simple rule is: The more CPU-intensive things are, the less relevant storage size becomes on the performance side.
1 2 3 4 5 6 7 |
postgres=# SELECT val, count(*) FROM t_row_timeseries GROUP BY ROLLUP (val) ORDER BY 1; ERROR: could not run distributed query with GROUPING SETS, CUBE, or ROLLUP HINT: Consider using an equality filter on the distributed table's partition column. Time: 2.225 ms |
Some operations are only possible on local storage and are hard to achieve in a distributed environment. This has to be kept in mind when making design decisions. Don’t expect everything to work just like before by just plugging in more systems - that can backfire, because inevitably, features will be lost.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply