Type alignment and padding bytes: how to not waste space in PostgreSQL tables

04.2025 | Category: | Tags: development , tuning

04.2025

Category:

Table of Contents

Saving storage space should not be your first objective in a PostgreSQL database. Very often, the desire to conserve space aggressively comes at the expense of performance. But there is also no reason to needlessly waste space. Therefore, it is a good idea to be familiar with the concepts of data type alignment and padding bytes in PostgreSQL.

What is data type alignment?

When the CPU reads or writes a value from memory, the performance is best if the address of the value is a multiple of the size of the value. For example, a 4-byte integer should start at an address that is a multiple of four. PostgreSQL tries to optimize for performance. Therefore, it makes sure that all values are correctly aligned in memory. Note that alignment is only relevant for data types with a fixed length: PostgreSQL stores variable length data types like text, varchar and numeric without respecting the type alignment.

Whenever PostgreSQL persists data on disk, it organizes these data in pages (also known as buffers or, when on disk, as blocks) of 8kB. To keep things simple and efficient, the layout of a block on disk is exactly the same as the page in memory. As a consequence, PostgreSQL respects the type alignment on disk as well.

PostgreSQL data types can have an alignment of 1, 2, 4 or 8 bytes. You can see the alignment of a data type in the typalign column of the system catalog pg_type:

SELECT typalign,
       string_agg(typname, ', ' ORDER BY length(typname)) AS types
FROM pg_type
WHERE typtype = 'b'            -- base type
  AND typelem = 0              -- no array
  AND typlen <> -1             -- fixed length
  AND typnamespace = 'pg_catalog'::regnamespace  -- system type
  AND typname NOT LIKE 'reg%'  -- no object identifier type
GROUP BY typalign;

 typalign │                                               types
══════════╪════════════════════════════════════════════════════════════════════════════════════════════════════
 c        │ bool, char, uuid
 d        │ xid8, time, int8, money, pg_lsn, float8, circle, timetz, aclitem, interval, timestamp, timestamptz
 i        │ cid, xid, oid, int4, date, float4, macaddr, macaddr8
 s        │ tid, int2
(4 rows)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

SELECT typalign,

string_agg(typname, ', ' ORDER BY length(typname)) AS types

FROM pg_type

WHERE typtype = 'b' -- base type

AND typelem = 0 -- no array

AND typlen <> -1 -- fixed length

AND typnamespace = 'pg_catalog'::regnamespace -- system type

AND typname NOT LIKE 'reg%' -- no object identifier type

GROUP BY typalign;

typalign │ types

══════════╪════════════════════════════════════════════════════════════════════════════════════════════════════

c │ bool, char, uuid

d │ xid8, time, int8, money, pg_lsn, float8, circle, timetz, aclitem, interval, timestamp, timestamptz

i │ cid, xid, oid, int4, date, float4, macaddr, macaddr8

s │ tid, int2

(4 rows)

Here, c (like char) stands for an alignment of one byte, s (like short) for two bytes, i (like int) for four bytes and d (like double) for eight bytes. I excluded the object identifier types for brevity, since you typically don't use them in table definitions.

What are padding bytes?

PostgreSQL is a “row store” — it stores a table row as one chunk of data, one column after the other. The row data start at an address that is a multiple of eight. So if the first column is a smallint (size and alignment 2) and the second column is a timestamp (size and alignment 8), PostgreSQL has to add six “padding bytes” between the first and the second column to properly align the timestamp. These six bytes are just wasted space! We can observe that with the “pageinspect” extension:

CREATE EXTENSION pageinspect;

CREATE TABLE tab (
   col1 smallint,
   col2 timestamp,
   col3 integer,
   col4 double precision
);

INSERT INTO tab VALUES (1, '2025-01-11 12:55:32.123456', 2, pi());

SELECT t_data FROM heap_page_items(get_raw_page('tab', 0));
                               t_data                               
════════════════════════════════════════════════════════════════════
 \x0100000000000000401bc67e6cce02000200000000000000182d4454fb210940
(1 row)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

CREATE EXTENSION pageinspect;

CREATE TABLE tab (

col1 smallint,

col2 timestamp,

col3 integer,

col4 double precision

);

INSERT INTO tab VALUES (1, '2025-01-11 12:55:32.123456', 2, pi());

SELECT t_data FROM heap_page_items(get_raw_page('tab', 0));

t_data

════════════════════════════════════════════════════════════════════

\x0100000000000000401bc67e6cce02000200000000000000182d4454fb210940

(1 row)

The first 0100 is the smallint (I am using a little-endian architecture). The following 000000000000 are six padding bytes. 401bc67e6cce0200 is the timestamp, directly followed by the integer 02000000. After four more padding bytes, we find the double precision value 182d4454fb210940.

All in all, we have wasted ten bytes! If we consider the 24 bytes of the row header, these ten wasted bytes comprise almost 22% of the table row.

Saving storage space by avoiding padding bytes

The column order in a table is fixed, but mostly irrelevant. If you define the columns of a table in the following order, you can avoid any padding:

first, define all frequently accessed columns of type uuid: that data type has an alignment of one byte, but a fixed size of 16 bytes, so no padding bytes will be necessary
next, define the columns with a data type with an alignment of eight bytes (bigint, timestamp, timestamp with time zone, double precision etc.)
then, define the columns with a data type with an alignment of four bytes (integer, date, real etc.)
then, define the columns with a data type with an alignment of two bytes (essentially, smallint)
finally, define the columns with a data type with an alignment of one byte or with variable length (boolean, text, varchar, character, numeric, other uuid columns etc.)

Note that character alias char, the fixed-length (blank-padded) string data type, is a data type with variable length: first, the actual length limit depends on the type modifier, and second, a character can have more than a single byte.

Following the above rules, the table from the previous example would look like this:

CREATE TABLE tab (
   col2 timestamp,
   col4 double precision,
   col3 integer,
   col1 smallint
);

1

2

3

4

5

6

CREATE TABLE tab (

col2 timestamp,

col4 double precision,

col3 integer,

col1 smallint

);

The performance impact of saving space by avoiding padding bytes

Rearranging the column order as described above has almost no disadvantage. There is only one consideration: in order to access the twentieth column of a table row, PostgreSQL has to skip over the first nineteen columns. This operation, known as tuple deforming, is not for free. It is faster to extract the earlier columns of a table. Also, skipping over a fixed-length column is cheaper than skipping over a column of variable length: for the latter, PostgreSQL has to read the varlena header of the value. So there may be a slight performance hit if you define a frequently accessed integer column after all the columns with an alignment of eight bytes.

On the other hand, PostgreSQL is likely to invoke the built-in JIT compiler if it has to deform a lot of tuples. Since the rules from the previous section arrange all fixed-length columns first, the offset of those columns is always the same. That allows the executable code generated by the JIT compiler to jump to the desired column in one step.

Also, the space savings themselves can benefit performance: sequential scans of smaller tables are faster. Also, you can cache more user data in shared buffers if you don't cache padding bytes.

All in all, I wouldn't worry about a potential performance hit too much.

Conclusion

By carefully designing the order of a table's columns, you can avoid wasting storage space by avoiding padding bytes. This kind of optimization is unlikely to hurt query performance and may even improve it.

7 responses to “Type alignment and padding bytes: how to not waste space in PostgreSQL tables”

tony says:

April 25, 2025 at 8:41 am

Hi, Laurenz Albe

> All in all, we have wasted ten bytes! If we consider the 24 bytes of the row header, these ten wasted bytes comprise almost 22% of the table row.

Could you shed some light on how the 22% was determined?

Reply
- Laurenz Albe says:
  
  April 25, 2025 at 1:01 pm
  
  The raw data without padding bytes are 22 bytes. Together with the 24 bytes header, that are 46 bytes. 10 bytes ate 21.74% of that.
  If you calculate it from the other end, you get a different number:
  The row is 32 bytes including the padding bytes, together with the header, that are 56 bytes. Then 10 wasted bytes would be 17.86%.
  Percentages are tricky that way. The key message is that the waste can be substantial.
  
  Reply
Turban says:

April 28, 2025 at 8:49 am

Usually integer fields are more important then timestamp fields. Specially if integer field is primary key. What about filling two integers at the beginning instead of one timestamp field?

"create table mytab1 (c1 int, c2 int, c3 timestamp)" -- 4 bytes, 4 bytes, 8 bytes

it may be better then
"create table (c3 timestamp, c1 int, c2 int)" -- 8 bytes, 4 bytes, 4 bytes

specially in both cases c1 is primary key.
"alter table mytab1 add primary key (c1)"

Like I understand it is only important we fill 8 bytes with the same group of data types.

In second case we have benefit of both words, no space waste and important attributes at the beginning of row.

Reply
- Laurenz Albe says:
  
  April 29, 2025 at 6:11 am
  
  Sure, once you understand how type alignment works, you can be more creative than my simple rules, and the primary key can be a column that is accessed frequently.
  But I'd use bigint rather than integer for a primary key in most cases, and that has an alignment of 8.
  
  Reply
tony says:

April 28, 2025 at 10:08 am

Hi, Laurenz Albe

Thank you for the explanation. Got it.

Reply
Felipe says:

May 23, 2025 at 2:01 pm

Hi Laurenz,

Experimenting a bit with the model you posted I did the following:

CREATE TABLE tab ( col1 smallint, col2 timestamp, col3 integer, col4 double precision );

CREATE TABLE tab2 ( col2 timestamp, col4 double precision, col3 integer, col1 smallint );

INSERT INTO tab (col1, col2, col3, col4) SELECT (random()*32767)::smallint, NOW() - (random()*365)::int * INTERVAL '1 day' + (random()*86400)::int * INTERVAL '1 second', (random()*1000)::int, random()*100.0 FROM generate_series(1,10000);

INSERT INTO tab2 (col2, col4, col3, col1) SELECT NOW() - (random()*365)::int * INTERVAL '1 day' + (random()*86400)::int * INTERVAL '1 second', random()*100.0, (random()*1000)::int, (random()*32767)::smallint FROM generate_series(1,10000);

SELECT pg_relation_size(quote_ident('public') || '.' || quote_ident('tab')) AS size_bytes; 606208 bytes for tab

SELECT pg_relation_size(quote_ident('public') || '.' || quote_ident('tab2')) AS size_bytes; 524288 bytes for tab2

The real difference between both is 606208 - 524288 = 81920 bytes, however, if we follow the theory, 10 padding bytes * 10000 rows = 100000 bytes, so the theory kind of overestimates the alignment effect, I am just wondering if I am missing out something else?

Thank you very much.

Regards, Felipe

Reply
- Laurenz Albe says:
  
  June 10, 2025 at 10:56 am
  
  I answered that question here.
  Sorry that it takes some time to moderate the comments to the blog - there is a lot of spam out there.
  
  Reply

Type alignment and padding bytes: how to not waste space in PostgreSQL tables

What is data type alignment?

What are padding bytes?

Saving storage space by avoiding padding bytes

The performance impact of saving space by avoiding padding bytes

Conclusion

7 responses to “Type alignment and padding bytes: how to not waste space in PostgreSQL tables”

Leave a Reply Cancel reply

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts