CYBERTEC PostgreSQL Logo

PostgreSQL: 1 trillion rows in Citus

03.2025 / Category: / Tags:

"PostgreSQL scales" - we have all heard this phrase over and over again. However, the question is: What does this actually mean? Does it mean 1 million rows? Or maybe even 1 billion rows? So, on a rainy weekend, I decided to do a little experiment to figure out if it is possible to squeeze 1 trillion rows (= 1000 billion rows) into my local personal computer.

For those of you who don't believe this to be possible - here is proof:

Using Citus columnar storage, this translates to:

But, before looking at the final result, we need to understand how we got here in the first place.

Loading one trillion rows into PostgreSQL

Obviously, I don't have a real world data set on hand that contains this much data, so I decided to generate this data synthetically. However, before we take a look at how this can be done, we have to discuss the basic setup.

Loading this much data into row storage would lead to insane amounts of data. Here is some math:

Assuming a tuple header (for each row) of about 24 bytes and 12 bytes data, we would end up with roughly 34 TB, which is larger than the maximum size of a table in PostgreSQL (assuming 8k blocks).

This basically leaves us with a couple of choices:

  • PostgreSQL table partitioning (row store)
  • Columnar store (single table)
  • Sharding using Citus

To make sure that I can query the data a bit faster, I decided on a sharded Citus columnar storage, consisting of a coordinator and 4 nodes (all residing on my local desktop machine):

Once Citus has been installed and wired up (more on Citus can be found here), we can create a simple table and distribute it using one of the columns:

To start, we can use the generate_series function to load some initial data. In this case, the INSERT statement generates 62.5 million rows in the most basic way:

After populating some initial data, we can start to double the number of rows in our Citus table as often as we want:

Running a query on 1 trillion rows on PostgreSQL

Voilà, after some time and enough steps to double the size of our distributed PostgreSQL table, we can query the content of the table. In the most simplistic of all cases, we can simply run a count:

The query takes roughly 53 minutes to run the operation. During the SELECT statement, we can see that Citus dispatches the load to a vast number of processes across those shards:

What is important to see here, is that we are basically CPU and not I/O bound, which is a really important observation. Usually, CPU is a lot easier and a lot cheaper to scale than I/O. Therefore, seeing fully loaded CPUs is actually a good sign.

Grouping large data sets:

Often, we want to know how many instances of each group can be found in the data set. Here is how this works:

Now, what can we see here? The most important thing to learn from this, is that Citus nicely dispatches the query to our shards. The plan shows that the shards are scanned (ColumnarScan) and aggregated locally. This works pretty much the same way a normal, single GROUP BY statement would aggregate data. However, there is more: Note that we are shared by key_id, so each incarnation of key_id is bound to a specific shard. Therefore, all Citus has to do is collect those partial results and sort them as desired in the query, which is now a quite simple task.

If we want to understand more deeply how this works, we can connect to one of the database nodes and check which queries are actually running.

Here is an example:

Citus is, really, professional engineering. It sets a proper application_name, and we can easily inspect what is going on inside our cluster by using standard means provided by PostgreSQL.

Breaking things at scale

I am a big fan of Citus and columnar storage - I strongly believe that Citus and PostgreSQL make a sensational team. The more I learn about both over the years, the more I adore these technologies. However, even at this stage, it is important to issue a warning: Just because something is good technology, it does not mean that you can simply throw every type of query against it and expect a sub-second answer.

Here is an example:

What is this query actually doing? For each incarnation of key_id, it calculates the average distance between values (ascending order). In real life, this means that the entire data set has to be sorted, sent through a window function, and then has to be grouped together. One can easily imagine that sorting 1
trillion rows is not really a quick operation - even when throwing hardware at the problem (remember: We are still only on my local PC).

The takeaway here is that if you start to run operations on ever larger data sets, you also have to scale up your thought process accordingly. One has to understand what is possible and what can be done easily. From a certain scale onwards, some operations are simply more tricky and can not be executed carelessly without facing dire consequences.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2025
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram