© Laurenz Albe 2025
Table of Contents
Saving storage space should not be your first objective in a PostgreSQL database. Very often, the desire to conserve space aggressively comes at the expense of performance. But there is also no reason to needlessly waste space. Therefore, it is a good idea to be familiar with the concepts of data type alignment and padding bytes in PostgreSQL.
When the CPU reads or writes a value from memory, the performance is best if the address of the value is a multiple of the size of the value. For example, a 4-byte integer should start at an address that is a multiple of four. PostgreSQL tries to optimize for performance. Therefore, it makes sure that all values are correctly aligned in memory. Note that alignment is only relevant for data types with a fixed length: PostgreSQL stores variable length data types like text
, varchar
and numeric
without respecting the type alignment.
Whenever PostgreSQL persists data on disk, it organizes these data in pages (also known as buffers or, when on disk, as blocks) of 8kB. To keep things simple and efficient, the layout of a block on disk is exactly the same as the page in memory. As a consequence, PostgreSQL respects the type alignment on disk as well.
PostgreSQL data types can have an alignment of 1, 2, 4 or 8 bytes. You can see the alignment of a data type in the typalign
column of the system catalog pg_type
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
SELECT typalign, string_agg(typname, ', ' ORDER BY length(typname)) AS types FROM pg_type WHERE typtype = 'b' -- base type AND typelem = 0 -- no array AND typlen <> -1 -- fixed length AND typnamespace = 'pg_catalog'::regnamespace -- system type AND typname NOT LIKE 'reg%' -- no object identifier type GROUP BY typalign; typalign │ types ══════════╪════════════════════════════════════════════════════════════════════════════════════════════════════ c │ bool, char, uuid d │ xid8, time, int8, money, pg_lsn, float8, circle, timetz, aclitem, interval, timestamp, timestamptz i │ cid, xid, oid, int4, date, float4, macaddr, macaddr8 s │ tid, int2 (4 rows) |
Here, c
(like char
) stands for an alignment of one byte, s
(like short
) for two bytes, i
(like int
) for four bytes and d
(like double
) for eight bytes. I excluded the object identifier types for brevity, since you typically don't use them in table definitions.
PostgreSQL is a “row store” — it stores a table row as one chunk of data, one column after the other. The row data start at an address that is a multiple of eight. So if the first column is a smallint
(size and alignment 2) and the second column is a timestamp (size and alignment 8), PostgreSQL has to add six “padding bytes” between the first and the second column to properly align the timestamp. These six bytes are just wasted space! We can observe that with the “pageinspect” extension:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
CREATE EXTENSION pageinspect; CREATE TABLE tab ( col1 smallint, col2 timestamp, col3 integer, col4 double precision ); INSERT INTO tab VALUES (1, '2025-01-11 12:55:32.123456', 2, pi()); SELECT t_data FROM heap_page_items(get_raw_page('tab', 0)); t_data ════════════════════════════════════════════════════════════════════ \x0100000000000000401bc67e6cce02000200000000000000182d4454fb210940 (1 row) |
The first 0100
is the smallint
(I am using a little-endian architecture). The following 000000000000
are six padding bytes. 401bc67e6cce0200
is the timestamp
, directly followed by the integer
02000000
. After four more padding bytes, we find the double precision
value 182d4454fb210940
.
All in all, we have wasted ten bytes! If we consider the 24 bytes of the row header, these ten wasted bytes comprise almost 22% of the table row.
The column order in a table is fixed, but mostly irrelevant. If you define the columns of a table in the following order, you can avoid any padding:
uuid
: that data type has an alignment of one byte, but a fixed size of 16 bytes, so no padding bytes will be necessarybigint
, timestamp
, timestamp with time zone
, double precision
etc.)integer
, date
, real
etc.)smallint
)boolean
, text
, varchar
, character
, numeric
, other uuid
columns etc.)Note that character
alias char
, the fixed-length (blank-padded) string data type, is a data type with variable length: first, the actual length limit depends on the type modifier, and second, a character can have more than a single byte.
Following the above rules, the table from the previous example would look like this:
1 2 3 4 5 6 |
CREATE TABLE tab ( col2 timestamp, col4 double precision, col3 integer, col1 smallint ); |
Rearranging the column order as described above has almost no disadvantage. There is only one consideration: in order to access the twentieth column of a table row, PostgreSQL has to skip over the first nineteen columns. This operation, known as tuple deforming, is not for free. It is faster to extract the earlier columns of a table. Also, skipping over a fixed-length column is cheaper than skipping over a column of variable length: for the latter, PostgreSQL has to read the varlena header of the value. So there may be a slight performance hit if you define a frequently accessed integer
column after all the columns with an alignment of eight bytes.
On the other hand, PostgreSQL is likely to invoke the built-in JIT compiler if it has to deform a lot of tuples. Since the rules from the previous section arrange all fixed-length columns first, the offset of those columns is always the same. That allows the executable code generated by the JIT compiler to jump to the desired column in one step.
Also, the space savings themselves can benefit performance: sequential scans of smaller tables are faster. Also, you can cache more user data in shared buffers if you don't cache padding bytes.
All in all, I wouldn't worry about a potential performance hit too much.
By carefully designing the order of a table's columns, you can avoid wasting storage space by avoiding padding bytes. This kind of optimization is unlikely to hurt query performance and may even improve it.
Leave a Reply