CYBERTEC PostgreSQL Logo

Dealing with the PostgreSQL error "found xmin ... from before relfrozenxid ..."

02.2025 / Category: / Tags: |

One of the worse examples of DBA humor: a woman tells a man with a vacuum cleaner, "Careful! If you vacuum the table, you might advance its relfrozenxid!"
© Laurenz Albe 2025

I have seen the error from the title reported often enough. If you are not familiar with the internals of PostgreSQL, the message will confuse you: what are xmin and relfrozenxid? Why is it a problem if one is before the other? So I think that it is worth the effort to write an article about the problem and what the error means. I'll also show you how to cope with the situation and fix the error.

What are xmin and relfrozenxid?

xmin and xmax

Each PostgreSQL table entry (“tuple”) has the system columns xmin and xmax. They contain the transaction ID of the transactions that created and invalidated (updated or deleted) the version of the row that the tuple represents. Each SQL statement has a snapshot that determines which transaction IDs it can see. If a statement can see the tuple's xmin, and the xmax is invalid (has a value of 0), invisible or belongs to a transaction that has not committed, that version of the row is visible to the statement. (For additional details about xmax, see this article.)

Transaction ID wraparound and freezing

One of the problems with PostgreSQL's multi-versioning architecture is that transaction IDs are generated from a 4-byte unsigned integer counter. Once the counter reaches its maximum value, it will “wrap around” to 3 (the values from 0 to 2 have special meanings). As a consequence, transaction IDs change their meaning as time goes by: a transaction ID that belonged to a committed transaction may now belong to a future transaction or one that is rolled back. That would mean that visible rows could suddenly become invisible, thus causing data corruption. If you want to see that at play, you should read my article on transaction ID wraparound.

To prevent this kind of data corruption from happening, VACUUM (typically triggered by autovacuum) freezes old, visible table rows: it sets a flag on the row that indicates that the reader should ignore xmin and xmax. That flag marks the row as unconditionally visible. Once PostgreSQL has frozen all old, visible rows in a database, the transaction ID can safely wrap around.

The meaning of relfrozenxid

Freezing old rows is vitally important for the health of a PostgreSQL database. Consequently, the system keeps track of the progress of freezing. The system catalog pg_class has a column “relfrozenxid”. All table entries that have an xmin or xmax that is equal or older than relfrozenxid are guaranteed to be frozen. PostgreSQL uses that column to trigger “anti-wraparound” autovacuum runs: if the relfrozenxid of a table is more than autovacuum_freeze_max_age transactions in the past, the autovacuum launcher will start an anti-wraparound vacuum worker process. Such a worker insists on visiting all pages of the table that may contain entries that are not yet frozen. After the anti-wraparound autovacuum run is done, PostgreSQL can advance the table's relfrozenxid.

What is the problem with an xmin that is older than relfrozenxid?

From the above it should be clear that no unfrozen tuple should ever contain an xmin that is older than the table's relfrozenxid. If we find such an unfrozen tuple, it is a case of data corruption. Indeed, the SQLSTATE of the error message we are researching is XX001. All error messages that start with XX indicate data corruption.

Note that the error happens during the execution of VACUUM. The error terminates the operation, so that VACUUM cannot finish processing the table. In particular, it won't advance relfrozenxid. If nobody detects and fixes the problem, the system will eventually come close to data loss. In that event, PostgreSQL stops processing any new transactions. If that happens, your system will face a down time until somebody can fix the problem using single-user mode. You don't want that to happen, so you should monitor the PostgreSQL log file for data corruption errors!

What can cause tuples with an xmin older than relfrozenxid?

The big question is why people keep getting this error message. Of course it is possible that all these people have hardware problems, which is the most frequent cause of data corruption. But I think that it is more likely that there is a yet undiscovered bug somewhere in PostgreSQL. It seems that if VACUUM is running concurrently with something else, the result can be an unfrozen tuple with an xmin older than relfrozenxid.

At this point, I'd like to ask you for help. If you can find a way to reproduce the error, please report your findings. Also, if you can imagine a situation in which concurrent operations can result in this kind of data corruption, that would be good to know. It will help to improve PostgreSQL. A good bug report is a valuable contribution to the project!

How to trigger the error by modifying relfrozenxid

In order to figure out how to deal with the error if it happens to us, we'd like to artificially cause the error to happen. As I wrote above, I cannot think of a way to trigger the problem with normal data modifications. But it is fairly easy to cause the problem to happen if we are willing to manually modify a catalog table. That is an unsupported operation that may break your system, so I'll create a new database that I can drop to get rid of the data corruption:

Note that a SELECT won't trigger the error message. Only VACUUM checks the data thoroughly enough to report the condition as an error.

How can I fix the error?

There are several ways to address the problem:

Dump and restore the table

Perhaps the simplest and least dangerous method to get rid of the problem is to export the table using pg_dump. Remember — querying the table won't trigger the error. Then you can drop the table and restore the dump:

While this method is simple, it has disadvantages:

  • if the table is large, exporting and importing it can take a long time
  • if there are foreign keys referening the table, you have to drop and re-create those foreign keys as well

The big advantage of this method, apart from its simplicity, is that export and import is the only way to be certain that you have got rid of all data corruption. So you should use this method whenever possible.

Update relfrozenxid in the pg_class system catalog

Another option is to manually update the relfrozenxid for the table's pg_class entry.

This technique is fast, but it has a disadvantage too: modifying a system catalog is unsupported and dangerous. If you pick a bad value for relfrozenxid, you could end up with worse problems like

Using pg_surgery on the broken table entry

Perhaps the most elegant way to deal with the error is to use the pg_surgery extension. With that extension, we can explicitly freeze the tuple once we know its physical address (ctid):

Using pg_surgery is not without risks either: it allows you to freeze or kill arbitrary table entries, which can cause data inconsistencies. The name of the extension should give you a clue: don't use a scalpel unless you know what you are doing!

Update the corrupted row

If you perform an UPDATE on the broken row, PostgreSQL will create a new, correct row version.

After that, the table can be vacuumed without an error, which will remove the broken data. Be careful: bulk updates that modify a big part of the table will lead to a lot of bloat. So treat only the broken rows.

Conclusion

People report the error “found xmin ... from before relfrozenxid ...” often enough to make me think that there may be a live data corruption bug in PostgreSQL. We have seen what the error means, and I have shown you three ways to deal with the problem. None of these ways is without disadvantages, so choose your approach carefully.

2 responses to “Dealing with the PostgreSQL error "found xmin ... from before relfrozenxid ..."”

  1. Thanks for the article!

    We had this issue and solved it using the following pseudocode (here, col is any non-indexed column):

    update tbl set col = col
    where xmin < relfrozenxid;

    Of course, we didn't do it in one go, and updated the table in batches.

    It was much faster than dump and restore, since only a small fraction of the table was affected.

    We also could not change pg_class or install pg_surgery due to permission restrictions.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2025
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram