© Laurenz Albe 2025
Table of Contents
I have seen the error from the title reported often enough. If you are not familiar with the internals of PostgreSQL, the message will confuse you: what are xmin
and relfrozenxid
? Why is it a problem if one is before the other? So I think that it is worth the effort to write an article about the problem and what the error means. I'll also show you how to cope with the situation and fix the error.
xmin
and relfrozenxid
?xmin
and xmax
Each PostgreSQL table entry (“tuple”) has the system columns xmin
and xmax
. They contain the transaction ID of the transactions that created and invalidated (updated or deleted) the version of the row that the tuple represents. Each SQL statement has a snapshot that determines which transaction IDs it can see. If a statement can see the tuple's xmin
, and the xmax
is invalid (has a value of 0), invisible or belongs to a transaction that has not committed, that version of the row is visible to the statement. (For additional details about xmax
, see this article.)
One of the problems with PostgreSQL's multi-versioning architecture is that transaction IDs are generated from a 4-byte unsigned integer counter. Once the counter reaches its maximum value, it will “wrap around” to 3 (the values from 0 to 2 have special meanings). As a consequence, transaction IDs change their meaning as time goes by: a transaction ID that belonged to a committed transaction may now belong to a future transaction or one that is rolled back. That would mean that visible rows could suddenly become invisible, thus causing data corruption. If you want to see that at play, you should read my article on transaction ID wraparound.
To prevent this kind of data corruption from happening, VACUUM
(typically triggered by autovacuum) freezes old, visible table rows: it sets a flag on the row that indicates that the reader should ignore xmin
and xmax
. That flag marks the row as unconditionally visible. Once PostgreSQL has frozen all old, visible rows in a database, the transaction ID can safely wrap around.
relfrozenxid
Freezing old rows is vitally important for the health of a PostgreSQL database. Consequently, the system keeps track of the progress of freezing. The system catalog pg_class
has a column “relfrozenxid
”. All table entries that have an xmin
or xmax
that is equal or older than relfrozenxid
are guaranteed to be frozen. PostgreSQL uses that column to trigger “anti-wraparound” autovacuum runs: if the relfrozenxid
of a table is more than autovacuum_freeze_max_age
transactions in the past, the autovacuum launcher will start an anti-wraparound vacuum worker process. Such a worker insists on visiting all pages of the table that may contain entries that are not yet frozen. After the anti-wraparound autovacuum run is done, PostgreSQL can advance the table's relfrozenxid
.
xmin
that is older than relfrozenxid
?From the above it should be clear that no unfrozen tuple should ever contain an xmin
that is older than the table's relfrozenxid
. If we find such an unfrozen tuple, it is a case of data corruption. Indeed, the SQLSTATE of the error message we are researching is XX001
. All error messages that start with XX
indicate data corruption.
Note that the error happens during the execution of VACUUM
. The error terminates the operation, so that VACUUM
cannot finish processing the table. In particular, it won't advance relfrozenxid
. If nobody detects and fixes the problem, the system will eventually come close to data loss. In that event, PostgreSQL stops processing any new transactions. If that happens, your system will face a down time until somebody can fix the problem using single-user mode. You don't want that to happen, so you should monitor the PostgreSQL log file for data corruption errors!
xmin
older than relfrozenxid
?The big question is why people keep getting this error message. Of course it is possible that all these people have hardware problems, which is the most frequent cause of data corruption. But I think that it is more likely that there is a yet undiscovered bug somewhere in PostgreSQL. It seems that if VACUUM
is running concurrently with something else, the result can be an unfrozen tuple with an xmin
older than relfrozenxid
.
At this point, I'd like to ask you for help. If you can find a way to reproduce the error, please report your findings. Also, if you can imagine a situation in which concurrent operations can result in this kind of data corruption, that would be good to know. It will help to improve PostgreSQL. A good bug report is a valuable contribution to the project!
relfrozenxid
In order to figure out how to deal with the error if it happens to us, we'd like to artificially cause the error to happen. As I wrote above, I cannot think of a way to trigger the problem with normal data modifications. But it is fairly easy to cause the problem to happen if we are willing to manually modify a catalog table. That is an unsupported operation that may break your system, so I'll create a new database that I can drop to get rid of the data corruption:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
CREATE DATABASE scratch; \connect scratch You are now connected to database "scratch" as user "postgres". CREATE TABLE boom (id integer); INSERT INTO boom VALUES (1); UPDATE pg_class SET relfrozenxid = pg_current_xact_id()::xid WHERE relname = 'boom'; SELECT * FROM boom; id ════ 1 (1 row) VACUUM boom; ERROR: found xmin 31676653 from before relfrozenxid 31676654 CONTEXT: while scanning block 0 offset 1 of relation "public.boom" |
Note that a SELECT
won't trigger the error message. Only VACUUM
checks the data thoroughly enough to report the condition as an error.
There are several ways to address the problem:
Perhaps the simplest and least dangerous method to get rid of the problem is to export the table using pg_dump
. Remember — querying the table won't trigger the error. Then you can drop the table and restore the dump:
1 2 3 |
pg_dump -U postgres -F c -t boom -f dumpfile scratch psql -U postgres -d scratch -c 'DROP TABLE boom' pg_restore -U postgres -d scratch dumpfile |
While this method is simple, it has disadvantages:
The big advantage of this method, apart from its simplicity, is that export and import is the only way to be certain that you have got rid of all data corruption. So you should use this method whenever possible.
relfrozenxid
in the pg_class
system catalogAnother option is to manually update the relfrozenxid
for the table's pg_class
entry.
1 2 3 |
UPDATE pg_class SET relfrozenxid = '31676653' WHERE relname = 'boom'; |
This technique is fast, but it has a disadvantage too: modifying a system catalog is unsupported and dangerous. If you pick a bad value for relfrozenxid
, you could end up with worse problems like
1 2 |
ERROR: could not access status of transaction 43350785 DETAIL: Could not open file "pg_xact/0029": No such file or directory. |
Perhaps the most elegant way to deal with the error is to use the pg_surgery extension. With that extension, we can explicitly freeze the tuple once we know its physical address (ctid
):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
CREATE EXTENSION pg_surgery; -- search only in block 0 SELECT ctid FROM boom WHERE ctid > '(0,0)' AND ctid < '(0,32000)' AND xmin = '31676653'; ctid ═══════ (0,1) (1 row) SELECT heap_force_freeze('boom', '{(0\,1)}'); heap_force_freeze ═══════════════════ (1 row) |
Using pg_surgery is not without risks either: it allows you to freeze or kill arbitrary table entries, which can cause data inconsistencies. The name of the extension should give you a clue: don't use a scalpel unless you know what you are doing!
If you perform an UPDATE
on the broken row, PostgreSQL will create a new, correct row version.
1 2 |
UPDATE boom SET id = id WHERE id = 1; |
After that, the table can be vacuumed without an error, which will remove the broken data. Be careful: bulk updates that modify a big part of the table will lead to a lot of bloat. So treat only the broken rows.
People report the error “found xmin ... from before relfrozenxid ...” often enough to make me think that there may be a live data corruption bug in PostgreSQL. We have seen what the error means, and I have shown you three ways to deal with the problem. None of these ways is without disadvantages, so choose your approach carefully.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Thanks for the article!
We had this issue and solved it using the following pseudocode (here, col is any non-indexed column):
update tbl set col = col
where xmin < relfrozenxid;
Of course, we didn't do it in one go, and updated the table in batches.
It was much faster than dump and restore, since only a small fraction of the table was affected.
We also could not change pg_class or install pg_surgery due to permission restrictions.
Thanks for the idea! I took the liberty to add it to the article.