CYBERTEC PostgreSQL Logo

Recovering Deleted Data From PostgreSQL Tables

03.2024 / Category: / Tags: |

PostgreSQL is very good at keeping your data safe so it doesn't disappear by itself. Unfortunately, the same holds in reverse--if data has been deleted, it stays unrecoverable.
In this article, we'll explore options for recovering deleted data from PostgreSQL tables.

Option 1: cautious people use transactions

If you never make any mistakes when working with data, there won't be any need for emergency procedures. But since we are all human, things like this happen:

A good habit to acquire is to always use transactions, and only COMMIT them after checking everything is okay.

With a transaction, the unexpected row count was much less stressful.

Option 2: careful people have backups

If you have working backups, you can recover from any data handling mistakes by fetching everything from backup again.

You could now perform a Point In Time Recovery (PITR) to a timestamp just before the undesirable DELETE command.

Option 3: quick people use pg_dirtyread

Let's say options 1 and 2 failed, and we really have to recover the data from the running PostgreSQL instance. The good news is that DELETE does not actually delete data, it just marks it as invisible for subsequent transactions. This is done to allow concurrent transactions to still read the data. The actual removal of rows occurs only when VACUUM (or autovacuum) cleans up the table. (For those interested in more details about that mechanism, see the MVCC chapter in the PostgreSQL documentation.)

There is no built-in way in PostgreSQL to get at the deleted-but-still-present rows, but there is a PostgreSQL extension that I am maintaining that allows this: pg_dirtyread.

We'll have to install pg_dirtyread from a package (or compile it from source) and create the extension in the database where the accident happened:

The extension provides a function pg_dirtyread('tablename') that reads a table like PostgreSQL itself does, but ignoring any markers for row deletion. SQL insists that we supply a column list with datatype annotations when calling it, so we first check the table definition:

From this we can compile the pg_dirtyread invocation:

Our data is there! We can copy the missing rows to a new table, and inject that into the original table:

Using pg_dirtyread works as long as VACUUM has not done its garbage-collection duties. VACUUM is triggered by the autovacuum launcher every minute on tables that have changed by at least 20%. If your faulty DELETE hit more than that (or it raised the accumulated bloat over that threshold), you have less than 60 seconds to shut down the database and turn off autovacuum before that happens. (Please do leave autovacuum=on by default in your database. While turning it off makes undeleting rows easier, bad things will happen if tables are not periodically cleaned from bloat.)

Option 4: desperate people use full page writes

If pg_dirtyread arrives too late because rows have already been garbage-collected, there is still hope. PostgreSQL keeps track of all changes in the write-ahead log (WAL). While these change records only contain the data from after the changes, the first time each page (the 8kB unit in which PostgreSQL processes data on disk) is touched, an image of the whole page is written to the WAL. These full page writes (FPW) can be collected to extract the deleted rows.

First, we need some low-level info about where to look:

We can the use pg_waldump to decode the WAL:

We can see our 3 DELETEd rows, and the first WAL record is marked as containing a FPW.

With PG16's pg_waldump, we can extract the FPW to a file:

It actually extracted two FPWs, but looking at the LSN, we are only interested in the first one, so I deleted the second one.

Let's feed that back to PostgreSQL by creating a new table, and concatenating the FPW files to form the table contents:

Since some of the rows in the FPWs are already marked as deleted, we still have to use pg_dirtyread:

This method is fragile, it works best when no commands other than the problematic DELETE have touched the table since the last CHECKPOINT. If other rows had been deleted after the last VACUUM, they might reappear as well. If your PostgreSQL version is older than 16, the --save-fullpage switch functionality needs to be backported.

Option 5: smart people have support contracts

Most of the options listed here require a fair amount of knowledge about the PostgreSQL internals. If you want to be on the safe side, consider getting a support contract from CYBERTEC where PostgreSQL experts help you with running PostgreSQL in your company or organization. Notably, the last customer I helped with data recovery was excited:

Thank you to all of you for your fast reaction.

Thank you so much! You are amazing.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram