Learn about pg_stat_io's debugging power: PostgreSQL 16 blesses users around the world with many features which ensure an even better user experience. One of those features is a system view called pg_stat_io
. It offers deep insights into the I/O behavior of your PostgreSQL database server. From PostgreSQL 16 onward, it will make it far easier to debug and assess performance-related problems.
Table of Contents
So, let us waste no time and dive into this wonderful feature.
pg_stat_io
: Debugging I/O in PostgreSQLThe following listing shows us what the pg_stat_io
view looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
blog=# d pg_stat_io View 'pg_catalog.pg_stat_io' Column | Type | Collation | Nullable | Default ----------------+--------------------------+-----------+----------+--------- backend_type | text | | | object | text | | | context | text | | | reads | bigint | | | read_time | double precision | | | writes | bigint | | | write_time | double precision | | | writebacks | bigint | | | writeback_time | double precision | | | extends | bigint | | | extend_time | double precision | | | op_bytes | bigint | | | hits | bigint | | | evictions | bigint | | | reuses | bigint | | | fsyncs | bigint | | | fsync_time | double precision | | | stats_reset | timestamp with time zone | | | |
It contains a few important columns with valuable information:
backend_type
: The source of informationIn this field, PostgreSQL describes from which type of backend the information is coming. The following backend types exist as of PostgreSQL 16:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
blog=# SELECT DISTINCT backend_type FROM pg_stat_io ORDER BY 1; backend_type --------------------- autovacuum launcher autovacuum worker background worker background writer checkpointer client backend standalone backend startup walsender (9 rows) |
The deal here is: PostgreSQL is based on a multi-process architecture - it is not multi-threaded as other database engines are. Each database connection is a process, and all the background operations are also isolated in separate processes as well. Those types of processes are what we call “backend type” in this context. At this point in time, we have got 9 different backend types, but it is actually likely that there will be more in the future.
object
: Type of storageIn this field, we find one of two values:
In PostgreSQL, we distinguish between temporary and permanent relations. Permanent relations are entities such as tables, indexes and so on. In short, things which are made to last. Temporary relations are not meant to exist forever. PostgreSQL will give us information about both types of storage.
context
: How did we do I/O?In PostgreSQL there are 4 contexts in which I/O can occur:
Normal operations are when we interact with shared_buffers
which is the I/O cache of PostgreSQL. So if a block is read from disk and stored in cache, or if a block is written out, this is considered to be “normal” I/O.
In contrast to that there is “vacuum”. This type of operation is VACUUM
or ANALYZE
-related and happens outside the normal I/O. Vacuum goes through the shared_buffers
cache machinery.
Finally there is bulkread
and and bulkwrite
. These are related to large operations outside shared_buffers
. How can that happen? If you happen to read a 15 TB table using a sequential scan (maybe SELECT count ( * ) FROM large_table
) PostgreSQL will not ask its own I/O cache for each and every 8k block, because the odds of having a cache hit are dire anyway. To learn more about this, consider checking out synchronized sequential scans.
reads
and writes
: Number of reads and writesThis is the number of I/O operations. Note that this is not expressed in bytes, but in operations. If you want to calculate the amount in bytes you have to multiply this number by op_bytes (see later column). That is true for read and writes alike.
read_time
and write_time
: Measuring I/O timeThe amount of data is one indicator pointing to possible I/O issues. However, what is even more relevant is the amount of time PostgreSQL has spent on actually fetching and writing this data. What is important to note here is that the track_io_timing
setting has to be turned on to fill this column.
writebacks
: Sending data from the kernel to the OSWriting to disk is not as simple as people might think. If you write to a file it does not mean that data immediately goes to the underlying storage device (“permanent storage”). Instead data is sent straight to the file system cache managed by the kernel first. Writeback means that we force the kernel to actually send data to permanent storage. This is again measured in “number of requests” which has to be multiplied by op_bytes
to receive a number of bytes.
writeback_time
: Measuring writeback timeThe number of writebacks is one thing. However, again we might be interested in finding out how long it actually took to write data to disk. In case track_io_timing
is on, this column will contain a useful value.
extends
: Growing PostgreSQL data filesWhen a transaction writes data there are two situations which can occur: There might be space inside a table / index / etc which can be reallocated or we might have to grow the data files to provide space for the data we want to write. Reallocation often happens after a VACUUM process or in case of HOT-Updates. However, in case there is no suitable space PostgreSQL grows the data file. An extend is again a unit of op_bytes
(= in this case the desired block size).
extend_time
: Time need to grow filesIn case extends are written, we can again measure the time needed for that. By default, time is only measured when track_io_timing
is turned on.
In general, I recommend turning on
track_io_timing
as you will otherwise not have the information you need to track down performance problems.
This is not only true for pg_stat_io
but also for many other PostgreSQL statistics.
op_bytes
: I/O chunkingOn a modern system, data is not sent to disk (or read from disk) byte by byte. Instead, I/O happens in chunks. Usually the size of a single I/O chunk is 8k, but it might be different on some systems.
When I/O happens through the PostgreSQL cache (shared_buffers
) we can face cache hits or cache misses. This value will tell us the number of hits we had to avoid asking the operating system for I/O.
evictions
: Cleaning shared buffersWhen blocks are removed from the PostgreSQL cache to make space for new ones, they get counted as evicted. This value gives us an idea of how much was in there.
reuses
: Reusing I/OThis variable might be confusing for some people. The documentation says: “The number of times an existing buffer in a size-limited ring buffer outside of shared buffers was reused as part of an I/O operation in the bulkread
, bulkwrite
, or vacuum
contexts.”. Now, this explanation might be pretty obscure to most people out there. What does it actually mean? Not all operations use the PostgreSQL I/O machinery. Suppose we sequentially scan a 10 TB table. As stated before, this scan will bypass the shared buffers entirely. Instead, a little ring buffer outside shared buffers is used to keep some blocks around. If one of these is reused, it will be counted in this variable.
fsyncs
and fsync_time
: Flushing to diskWhen data is written to disk we have to distinguish between two cases: Either we want to rely on the fact that data is on disk or we don’t care much. The question which naturally arises is: Hey? This is a database system! Why would anybody not care if data is written to disk or not? Well, consider sorting on disk. If the data we want to sort is lost, we have to restart the sort process after a reboot anyway. Or: If we die in the middle of a transaction, the data should be gone anyway. What this means is that if we want data to survive no matter what, we sync it to disk using a flush. Otherwise we do not. This column will count the number of flushes and the time we needed to flush.
stats_reset
: Reset timeFinally, PostgreSQL will tell us as of when data has been accumulated, and when the statistics have been reset.
pg_stat_io
: Inspecting the dataAfter this rather theoretical introduction to the topic, we can take a look at some sample data.
Here is some autovacuum data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
blog=# SELECT * FROM pg_stat_io WHERE backend_type = 'autovacuum worker'; -[ RECORD 1 ]--+------------------------------ backend_type | autovacuum worker object | relation context | bulkread reads | 0 read_time | 0 writes | 0 write_time | 0 writebacks | 0 writeback_time | 0 extends | extend_time | op_bytes | 8192 hits | 0 evictions | 0 reuses | 0 fsyncs | fsync_time | stats_reset | 2023-08-24 11:03:59.786026+02 -[ RECORD 2 ]--+------------------------------ backend_type | autovacuum worker object | relation context | normal reads | 29 read_time | 0 writes | 0 write_time | 0 writebacks | 0 writeback_time | 0 extends | 24 extend_time | 0 op_bytes | 8192 hits | 56682 evictions | 0 reuses | fsyncs | 0 fsync_time | 0 stats_reset | 2023-08-24 11:03:59.786026+02 -[ RECORD 3 ]--+------------------------------ backend_type | autovacuum worker object | relation context | vacuum reads | 70838 read_time | 0 writes | 42200 write_time | 0 writebacks | 0 writeback_time | 0 extends | 0 extend_time | 0 op_bytes | 8192 hits | 3654 evictions | 0 reuses | 70799 fsyncs | fsync_time | stats_reset | 2023-08-24 11:03:59.786026+02 |
For each context, you'll see one row. Most of the I/O happened in the “vacuum
” context (which is no surprise if you happen to be an autovacuum worker process).
What happens within a normal database connection (= “client backend”)? Here is an example of such a pg_stat_io
query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
blog=# SELECT * FROM pg_stat_io WHERE backend_type = 'client backend'; -[ RECORD 1 ]--+------------------------------ backend_type | client backend object | relation context | bulkread reads | 894 read_time | 0 writes | 0 write_time | 0 writebacks | 0 writeback_time | 0 extends | extend_time | op_bytes | 8192 hits | 14 evictions | 0 reuses | 131 fsyncs | fsync_time | stats_reset | 2023-08-24 11:03:59.786026+02 -[ RECORD 2 ]--+------------------------------ backend_type | client backend object | relation context | bulkwrite reads | 0 read_time | 0 writes | 42200 write_time | 0 writebacks | 0 writeback_time | 0 extends | 44248 extend_time | 0 op_bytes | 8192 hits | 0 evictions | 0 reuses | 42200 fsyncs | fsync_time | stats_reset | 2023-08-24 11:03:59.786026+02 -[ RECORD 3 ]--+------------------------------ backend_type | client backend object | relation context | normal reads | 180 read_time | 0 writes | 0 write_time | 0 writebacks | 0 writeback_time | 0 extends | 0 extend_time | 0 op_bytes | 8192 hits | 7291 evictions | 0 reuses | fsyncs | 0 fsync_time | 0 stats_reset | 2023-08-24 11:03:59.786026+02 -[ RECORD 4 ]--+------------------------------ backend_type | client backend object | relation context | vacuum reads | 2 read_time | 0 writes | 0 write_time | 0 writebacks | 0 writeback_time | 0 extends | 0 extend_time | 0 op_bytes | 8192 hits | 179 evictions | 0 reuses | 0 fsyncs | fsync_time | stats_reset | 2023-08-24 11:03:59.786026+02 -[ RECORD 5 ]--+------------------------------ backend_type | client backend object | temp relation context | normal reads | 0 read_time | 0 writes | 0 write_time | 0 writebacks | writeback_time | extends | 0 extend_time | 0 op_bytes | 8192 hits | 0 evictions | 0 reuses | fsyncs | fsync_time | stats_reset | 2023-08-24 11:03:59.786026+02 |
The I/O happened during the creation of the table. Other than that, the table wasn't used. How did we figure that out? Well, the entire I/O was in a bulk operation and not during normal operations. This tells us that the table wasn't used at all since the time of its creation.
Inspecting I/O is important; it gives you valuable information about what is really going on inside your database engine. Another data warehousing tool to improve performance are synchronized sequential scans. Find out about them in my blog.
Or, learn more basics about how to spot and manage performance problems in this blog about how to quickly detect slow queries.
We try to cover information to support PostgreSQL and to help people using their systems. If you want to know more about PostgreSQL support and PostgreSQL consulting - simply reach out to us.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook, or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Thanks a lot for this article Hans-Jürgen!! Awesome!