Dealing with the PostgreSQL error "found xmin ... from before relfrozenxid ..."

02.2025 | Category: How To | Tags: administration, under the hood

02.2025

Category: How To

One of the worse examples of DBA humor: a woman tells a man with a vacuum cleaner, "Careful! If you vacuum the table, you might advance its relfrozenxid!"
© Laurenz Albe 2025

Table of Contents

I have seen the error from the title reported often enough. If you are not familiar with the internals of PostgreSQL, the message will confuse you: what are xmin and relfrozenxid? Why is it a problem if one is before the other? So I think that it is worth the effort to write an article about the problem and what the error means. I'll also show you how to cope with the situation and fix the error.

What are `xmin` and `relfrozenxid`?

`xmin` and `xmax`

Each PostgreSQL table entry (“tuple”) has the system columns xmin and xmax. They contain the transaction ID of the transactions that created and invalidated (updated or deleted) the version of the row that the tuple represents. Each SQL statement has a snapshot that determines which transaction IDs it can see. If a statement can see the tuple's xmin, and the xmax is invalid (has a value of 0), invisible or belongs to a transaction that has not committed, that version of the row is visible to the statement. (For additional details about xmax, see this article.)

Transaction ID wraparound and freezing

One of the problems with PostgreSQL's multi-versioning architecture is that transaction IDs are generated from a 4-byte unsigned integer counter. Once the counter reaches its maximum value, it will “wrap around” to 3 (the values from 0 to 2 have special meanings). As a consequence, transaction IDs change their meaning as time goes by: a transaction ID that belonged to a committed transaction may now belong to a future transaction or one that is rolled back. That would mean that visible rows could suddenly become invisible, thus causing data corruption. If you want to see that at play, you should read my article on transaction ID wraparound.

To prevent this kind of data corruption from happening, VACUUM (typically triggered by autovacuum) freezes old, visible table rows: it sets a flag on the row that indicates that the reader should ignore xmin and xmax. That flag marks the row as unconditionally visible. Once PostgreSQL has frozen all old, visible rows in a database, the transaction ID can safely wrap around.

The meaning of `relfrozenxid`

Freezing old rows is vitally important for the health of a PostgreSQL database. Consequently, the system keeps track of the progress of freezing. The system catalog pg_class has a column “relfrozenxid”. All table entries that have an xmin or xmax that is equal or older than relfrozenxid are guaranteed to be frozen. PostgreSQL uses that column to trigger “anti-wraparound” autovacuum runs: if the relfrozenxid of a table is more than autovacuum_freeze_max_age transactions in the past, the autovacuum launcher will start an anti-wraparound vacuum worker process. Such a worker insists on visiting all pages of the table that may contain entries that are not yet frozen. After the anti-wraparound autovacuum run is done, PostgreSQL can advance the table's relfrozenxid.

What is the problem with an `xmin` that is older than `relfrozenxid`?

From the above it should be clear that no unfrozen tuple should ever contain an xmin that is older than the table's relfrozenxid. If we find such an unfrozen tuple, it is a case of data corruption. Indeed, the SQLSTATE of the error message we are researching is XX001. All error messages that start with XX indicate data corruption.

Note that the error happens during the execution of VACUUM. The error terminates the operation, so that VACUUM cannot finish processing the table. In particular, it won't advance relfrozenxid. If nobody detects and fixes the problem, the system will eventually come close to data loss. In that event, PostgreSQL stops processing any new transactions. If that happens, your system will face a down time until somebody can fix the problem using single-user mode. You don't want that to happen, so you should monitor the PostgreSQL log file for data corruption errors!

What can cause tuples with an `xmin` older than `relfrozenxid`?

The big question is why people keep getting this error message. Of course it is possible that all these people have hardware problems, which is the most frequent cause of data corruption. But I think that it is more likely that there is a yet undiscovered bug somewhere in PostgreSQL. It seems that if VACUUM is running concurrently with something else, the result can be an unfrozen tuple with an xmin older than relfrozenxid.

At this point, I'd like to ask you for help. If you can find a way to reproduce the error, please report your findings. Also, if you can imagine a situation in which concurrent operations can result in this kind of data corruption, that would be good to know. It will help to improve PostgreSQL. A good bug report is a valuable contribution to the project!

How to trigger the error by modifying `relfrozenxid`

In order to figure out how to deal with the error if it happens to us, we'd like to artificially cause the error to happen. As I wrote above, I cannot think of a way to trigger the problem with normal data modifications. But it is fairly easy to cause the problem to happen if we are willing to manually modify a catalog table. That is an unsupported operation that may break your system, so I'll create a new database that I can drop to get rid of the data corruption:

CREATE DATABASE scratch;
\connect scratch
You are now connected to database "scratch" as user "postgres".
CREATE TABLE boom (id integer);
INSERT INTO boom VALUES (1);
UPDATE pg_class
   SET relfrozenxid = pg_current_xact_id()::xid
   WHERE relname = 'boom';

SELECT * FROM boom;

 id 
════
  1
(1 row)

VACUUM boom;
ERROR:  found xmin 31676653 from before relfrozenxid 31676654
CONTEXT:  while scanning block 0 offset 1 of relation "public.boom"

CREATE DATABASE scratch;

\connect scratch

You are now connected to database "scratch" as user "postgres".

CREATE TABLE boom (id integer);

INSERT INTO boom VALUES (1);

UPDATE pg_class

SET relfrozenxid = pg_current_xact_id()::xid

WHERE relname = 'boom';

SELECT * FROM boom;

════

(1 row)

VACUUM boom;

ERROR: found xmin 31676653 from before relfrozenxid 31676654

CONTEXT: while scanning block 0 offset 1 of relation "public.boom"

Note that a SELECT won't trigger the error message. Only VACUUM checks the data thoroughly enough to report the condition as an error.

How can I fix the error?

There are several ways to address the problem:

Dump and restore the table

Perhaps the simplest and least dangerous method to get rid of the problem is to export the table using pg_dump. Remember — querying the table won't trigger the error. Then you can drop the table and restore the dump:

pg_dump -U postgres -F c -t boom -f dumpfile scratch
psql -U postgres -d scratch -c 'DROP TABLE boom'
pg_restore -U postgres -d scratch dumpfile

pg_dump -U postgres -F c -t boom -f dumpfile scratch

psql -U postgres -d scratch -c 'DROP TABLE boom'

pg_restore -U postgres -d scratch dumpfile

While this method is simple, it has disadvantages:

if the table is large, exporting and importing it can take a long time
if there are foreign keys referening the table, you have to drop and re-create those foreign keys as well

The big advantage of this method, apart from its simplicity, is that export and import is the only way to be certain that you have got rid of all data corruption. So you should use this method whenever possible.

Update `relfrozenxid` in the `pg_class` system catalog

Another option is to manually update the relfrozenxid for the table's pg_class entry.

UPDATE pg_class
   SET relfrozenxid = '31676653'
   WHERE relname = 'boom';

UPDATE pg_class

SET relfrozenxid = '31676653'

WHERE relname = 'boom';

This technique is fast, but it has a disadvantage too: modifying a system catalog is unsupported and dangerous. If you pick a bad value for relfrozenxid, you could end up with worse problems like

ERROR:  could not access status of transaction 43350785
DETAIL:  Could not open file "pg_xact/0029": No such file or directory.

1 2	ERROR: could not access status of transaction 43350785 DETAIL: Could not open file "pg_xact/0029": No such file or directory.

Using pg_surgery on the broken table entry

Perhaps the most elegant way to deal with the error is to use the pg_surgery extension. With that extension, we can explicitly freeze the tuple once we know its physical address (ctid):

CREATE EXTENSION pg_surgery;

-- search only in block 0
SELECT ctid FROM boom
WHERE ctid > '(0,0)'
  AND ctid < '(0,32000)'
  AND xmin = '31676653';

 ctid  
═══════
 (0,1)
(1 row)

SELECT heap_force_freeze('boom', '{(0\,1)}');

 heap_force_freeze 
═══════════════════
 
(1 row)

CREATE EXTENSION pg_surgery;

-- search only in block 0

SELECT ctid FROM boom

WHERE ctid > '(0,0)'

AND ctid < '(0,32000)'

AND xmin = '31676653';

ctid

═══════

(0,1)

(1 row)

SELECT heap_force_freeze('boom', '{(0\,1)}');

heap_force_freeze

═══════════════════

(1 row)

Using pg_surgery is not without risks either: it allows you to freeze or kill arbitrary table entries, which can cause data inconsistencies. The name of the extension should give you a clue: don't use a scalpel unless you know what you are doing!

Update the corrupted row

If you perform an UPDATE on the broken row, PostgreSQL will create a new, correct row version.

UPDATE boom SET id = id
WHERE id = 1;

1 2	UPDATE boom SET id = id WHERE id = 1;

After that, the table can be vacuumed without an error, which will remove the broken data. Be careful: bulk updates that modify a big part of the table will lead to a lot of bloat. So treat only the broken rows.

Conclusion

People report the error “found xmin ... from before relfrozenxid ...” often enough to make me think that there may be a live data corruption bug in PostgreSQL. We have seen what the error means, and I have shown you three ways to deal with the problem. None of these ways is without disadvantages, so choose your approach carefully.

2 responses to “Dealing with the PostgreSQL error "found xmin ... from before relfrozenxid ..."”

Sadeq Dousti says:

February 11, 2025 at 10:01 pm

Thanks for the article!

We had this issue and solved it using the following pseudocode (here, col is any non-indexed column):

update tbl set col = col
where xmin < relfrozenxid;

Of course, we didn't do it in one go, and updated the table in batches.

It was much faster than dump and restore, since only a small fraction of the table was affected.

We also could not change pg_class or install pg_surgery due to permission restrictions.

Reply
- Laurenz Albe says:
  
  February 11, 2025 at 10:40 pm
  
  Thanks for the idea! I took the liberty to add it to the article.
  
  Reply

Dealing with the PostgreSQL error "found xmin ... from before relfrozenxid ..."

What are `xmin` and `relfrozenxid`?

`xmin` and `xmax`

Transaction ID wraparound and freezing

The meaning of `relfrozenxid`

What is the problem with an `xmin` that is older than `relfrozenxid`?

What can cause tuples with an `xmin` older than `relfrozenxid`?

How to trigger the error by modifying `relfrozenxid`

How can I fix the error?

Dump and restore the table

Update `relfrozenxid` in the `pg_class` system catalog

Using pg_surgery on the broken table entry

Update the corrupted row

Conclusion

2 responses to “Dealing with the PostgreSQL error "found xmin ... from before relfrozenxid ..."”

Leave a Reply Cancel reply

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Dealing with the PostgreSQL error "found xmin ... from before relfrozenxid ..."

What are xmin and relfrozenxid?

xmin and xmax

Transaction ID wraparound and freezing

The meaning of relfrozenxid

What is the problem with an xmin that is older than relfrozenxid?

What can cause tuples with an xmin older than relfrozenxid?

How to trigger the error by modifying relfrozenxid

How can I fix the error?

Dump and restore the table

Update relfrozenxid in the pg_class system catalog

Using pg_surgery on the broken table entry

Update the corrupted row

Conclusion

2 responses to “Dealing with the PostgreSQL error "found xmin ... from before relfrozenxid ..."”

Leave a Reply Cancel reply

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

What are `xmin` and `relfrozenxid`?

`xmin` and `xmax`

The meaning of `relfrozenxid`

What is the problem with an `xmin` that is older than `relfrozenxid`?

What can cause tuples with an `xmin` older than `relfrozenxid`?

How to trigger the error by modifying `relfrozenxid`

Update `relfrozenxid` in the `pg_class` system catalog