Indexes are, by far, the most important feature related to performance that every general purpose relational database provides. Without indexes, there are no such things as efficient search operations, no unique constraints, and no primary keys. Achieving good performance on reasonable amounts of data without indexes is therefore totally impossible.
Table of Contents
The main question now is: what happens during index creation, and how can we speed up the process? Creating indexes on billions of rows is quite costly, because data has to be sorted and turned into an efficient search structure. To demonstrate how we can build indexes quickly, I have created a simple set of demo data:
1 2 3 4 5 6 7 |
blog=# CREATE TABLE t_data AS SELECT id::int4, (random() * 1000000000)::int4 AS i, (random() * 1000000000) AS r, (random() * 1000000000)::numeric AS n FROM generate_series(1, 1000000000) AS id; Time: 1569002.499 ms (26:09.002) |
This sample data has a couple of interesting properties: the “id” column is an ascending number. During index creation, this makes a huge difference. The second column contains a random value multiplied by the number of rows as integer value. The third column contains a “double precision” number, and finally, at the end, we are storing similar data as “numeric”, which is a floating point number that does not use the FPU (floating point unit of the CPU) internally.
Once these billion rows have been created, we can easily check the size of the table by using the following command:
1 2 3 4 5 |
blog=# SELECT pg_size_pretty(pg_relation_size('t_data')); pg_size_pretty ---------------- 56 GB (1 row) |
Once the test data have been created, it makes sense to set the so-called “hint bits” in PostgreSQL, which will allow us to make a fair comparison between various runs. If you are interested in hint bits and their relation to VACUUM, consider checking out our blog post about this topic.
1 2 3 |
blog=# VACUUM ANALYZE; VACUUM Time: 91293.971 ms (01:31.294) |
While VACUUM is running, we can check the progress of this operation. In case a VACUUM operation takes long, this can give us some interesting insights into what is going on inside your database:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
demo=# SELECT * FROM pg_stat_progress_vacuum; -[ RECORD 1 ]------+-------------- pid | 5945 datid | 1515520 datname | blog relid | 1515537 phase | scanning heap heap_blks_total | 7352960 heap_blks_scanned | 212599 heap_blks_vacuumed | 0 index_vacuum_count | 0 max_dead_tuples | 291 num_dead_tuples | 0 |
After creating the data set, we can move forward and create simple indexes on our table:
1 2 3 4 5 6 7 8 |
test=# \d t_data Table "public.t_data" Column | Type | Collation | Nullable | Default --------+------------------+-----------+----------+--------- id | integer | | | i | integer | | | r | double precision | | | n | numeric | | | |
The following listing shows how each column is indexed, and how long it takes to create the index. Note that all those operations take are performed with PostgreSQL default settings, which is far from ideal, as we will see later. All tests have been executed on an “AMD Ryzen Threadripper 2950X 16-Core” processor:
1 2 3 |
blog=# CREATE INDEX ON t_data (id); CREATE INDEX Time: 291700.318 ms (04:51.700) |
As we can see, index creation takes almost 5 minutes. Remember, we are dealing with 1 billion rows and we have not even started to tune the database. What is interesting to see is what PostgreSQL does while the index is created. We can inspect a system view to give us some clues:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
blog=# SELECT * FROM pg_stat_progress_create_index; -[ RECORD 1 ]------+------------------------------- pid | 5945 datid | 1515520 datname | blog relid | 1515537 index_relid | 0 command | CREATE INDEX phase | building index: scanning table lockers_total | 0 lockers_done | 0 current_locker_pid | 0 blocks_total | 7352960 blocks_done | 703280 tuples_total | 0 tuples_done | 0 partitions_total | 0 partitions_done | 0 |
In the first step, PostgreSQL will scan the table and prepare the data for the sort operation which is about to come next:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
-[ RECORD 1 ]------+------------------------------------ pid | 5945 datid | 1515520 datname | blog relid | 1515537 index_relid | 0 command | CREATE INDEX phase | building index: sorting live tuples lockers_total | 0 lockers_done | 0 current_locker_pid | 0 blocks_total | 7352960 blocks_done | 7352960 tuples_total | 0 tuples_done | 0 partitions_total | 0 partitions_done | 0 |
Sorting can be very time consuming, and it is important to tune this process properly to achieve good performance. After the sort process, PostgreSQL will add those sorted tuples into the index as shown in the next listing:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
-[ RECORD 1 ]------+--------------------------------------- pid | 5945 datid | 1515520 datname | blog relid | 1515537 index_relid | 0 command | CREATE INDEX phase | building index: loading tuples in tree lockers_total | 0 lockers_done | 0 current_locker_pid | 0 blocks_total | 0 blocks_done | 0 tuples_total | 1000000000 tuples_done | 263748386 partitions_total | 0 partitions_done | 0 |
After creating the first index, we can take a look at the second index column. Note that the data type and the amount of data are the same. Still, we can observe that the process takes longer, because data is fed to the index creation in random order, which can make a huge difference in most cases:
1 2 3 |
blog=# CREATE INDEX ON t_data (i); CREATE INDEX Time: 401634.592 ms (06:41.635) |
Wow, the index creation takes almost 2 minutes longer, which is really relevant - we are talking about a substantially longer sort of operation than before.
However, the physical order of data is not the only thing that can make a huge difference. The same is true for the data type we want to index. Just take a look and see how the runtime will change depending on the data type:
1 2 3 |
blog=# CREATE INDEX ON t_data (r); CREATE INDEX Time: 441994.372 ms (07:21.994) |
The “double precision” column is again 40 seconds slower. One has to keep in mind that a “double precision” value is larger than an integer value, which definitely contributes to those runtime differences.
The last thing to take a look at is to index the “numeric”, which behaves really in a totally different way than everything before:
1 2 3 |
blog=# CREATE INDEX ON t_data (n); CREATE INDEX Time: 799659.658 ms (13:19.660) |
What is important to see here is that two things really matter:
Both aspects are often totally underestimated, because most people only think of “more RAM” and “more CPUs” (as if nothing else mattered at all).
What people often do is to throw hardware and configuration parameters at the problem.
Often, the following parameters are changed:
1 2 3 4 5 6 7 8 |
blog=# ALTER SYSTEM SET max_wal_size TO '100 GB'; ALTER SYSTEM blog=# ALTER SYSTEM SET max_parallel_maintenance_workers TO 10; ALTER SYSTEM blog=# ALTER SYSTEM SET maintenance_work_mem TO '16 GB'; ALTER SYSTEM blog=# ALTER SYSTEM SET shared_buffers TO '64 GB'; ALTER SYSTEM |
Once this is done, we can restart the database. Note that without changing shared_buffers
, “SELECT pg_reload_conf()
” would be totally sufficient.
Let us discuss those parameters one at a time:
Let us run the index creation again:
1 2 3 |
blog=# CREATE INDEX ON t_data (n); CREATE INDEX Time: 474907.805 ms (07:54.908) |
This is interesting: we managed to reduce the time from around 13 min 19 sec down to under 8 minutes. Note that this is not even double the speed. However, what does the CPU say?
1 2 3 4 5 6 7 8 9 |
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8773 hs 20 0 67.8g 2.2g 196256 R 99.0 1.8 1:05.88 postgres 8824 hs 20 0 66.6g 968576 170576 R 99.0 0.7 0:18.96 postgres 8825 hs 20 0 66.6g 1.0g 170796 R 99.0 0.8 0:18.98 postgres 8826 hs 20 0 67.4g 1.8g 170796 R 99.0 1.4 0:18.97 postgres 8827 hs 20 0 67.4g 1.8g 170796 R 99.0 1.4 0:18.95 postgres 8829 hs 20 0 66.6g 990.8m 170796 R 99.0 0.8 0:18.93 postgres 8823 hs 20 0 67.4g 1.8g 170796 R 98.7 1.4 0:18.95 postgres 8828 hs 20 0 66.7g 1.0g 170376 R 98.7 0.8 0:18.95 postgres |
We caught the CPU while sorting the data. Now this is relevant, because the sort phase can be done in parallel in a nice way - many other stages cannot happen in RAM or cannot happen concurrently. That is the reason why, even with countless CPU cores, we only managed to speed up things a bit. What definitely limits us here is the capacity of the local SSD, which yields around 600 MB / sec during sorting. But, what is even more noteworthy, is the fact that the default configuration sorting integer variables is actually faster than a fully parallel index creation, using many higher parameter settings.
The fact that the data type indeed matters shows us that tweaking configuration parameters is definitely not the only path to enlightenment. Deciding on the right data type can make a huge difference as well.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
"to almost 8 seconds" - correct 8 minutes.
Thanks for that, overall interesting. A couple of minor corrections:
"TRYING TO MANY PERFORMANCE IMPROVEMENTS" (TOO, not TO)
"from around 13 min 19 sec down to almost 8 seconds" (you meant minutes not seconds)
Do you have a typo on 8 seconds, it should be 8 minutes?
"from around 13 min 19 sec down to almost 8 seconds" 8 minutes, not seconds.
Hi! nice article, thank you! A small corrective remark: "down to almost 8 seconds" should be "down to almost 8 minutes".
Good read as always!
Suggesting some minor rectifications in the blog:
Thanks for sharing!
"down to almost 8 seconds." minutes ?
Index creation time seems to be reduced 8 mins. Just verifying as you said 8 seconds?
13 min 19 sec down to almost 8 seconds => "8 minutes"
Interesting test!
I think that in "we managed to reduce the time from around 13 min 19 sec down to almost 8 seconds." the last word should be "minutes"
Thanks for the correction; fixed.