Table of Contents
We know that PostgreSQL does not update a table row in place. Rather, it writes a new version of the row (the PostgreSQL term for a row version is “tuple”) and leaves the old row version in place to serve concurrent read requests. VACUUM
later removes these “dead tuples”.
If you delete a row and insert a new one, the effect is similar: we have one dead tuple and one new live tuple. This is why many people (me, among others) explain to beginners that “an UPDATE
in PostgreSQL is almost the same as a DELETE
, followed by an INSERT
”.
This article is about that “almost”.
UPDATE
and DELETE
+ INSERT
Let's take this simple test table:
1 2 3 4 5 6 |
CREATE TABLE uptest ( id smallint PRIMARY KEY, val smallint NOT NULL ); INSERT INTO uptest VALUES (1, 42); |
In the two following tests, we will issue statements from two concurrent sessions.
First, the UPDATE
:
1 2 3 4 5 6 7 8 9 10 11 |
Session 1 Session 2 BEGIN; UPDATE uptest SET id = 2 WHERE val = 42; SELECT id FROM uptest WHERE val = 42 FOR UPDATE; -- hangs COMMIT; -- one row is returned |
Let's reset the table before the second test;
1 2 |
TRUNCATE uptest; INSERT INTO uptest VALUES (1, 42); |
Now let's repeat the experiment with DELETE
and INSERT
:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Session 1 Session 2 BEGIN; DELETE FROM uptest WHERE id = 1; INSERT INTO uptest VALUES (2, 42); SELECT id FROM uptest WHERE val = 42 FOR UPDATE; -- hangs COMMIT; -- no row is returned |
The documentation describes what happens when an SQL statement runs into a lock in a transaction with the default READ COMMITTED
isolation level:
UPDATE
,DELETE
,SELECT FOR UPDATE
, andSELECT FOR SHARE
commands behave the same asSELECT
in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (theWHERE
clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row. In the case ofSELECT FOR UPDATE
andSELECT FOR SHARE
, this means it is the updated version of the row that is locked and returned to the client.
The above shows that there is some way for PostgreSQL to find the new version of an updated row. That is why the first experiment returned a result row. In the second experiment, there was no connection between the old, deleted row and the newly inserted one, that's why we get no result in that case.
To figure out how the old and the new version are connected, we have to look deeper.
UPDATE
with “pageinspect”The extension pageinspect
allows us to see all data in a PostgreSQL data page. It requires superuser permissions.
Let's use it to see what's on disk after the first experiment:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
TRUNCATE uptest; INSERT INTO uptest VALUES (1, 42); UPDATE uptest SET id = 2 WHERE val = 42; SELECT lp, t_xmin AS xmin, t_xmax AS xmax, t_ctid, to_hex(t_infomask2) AS infomask2, to_hex(t_infomask) AS infomask, t_attrs FROM heap_page_item_attrs(get_raw_page('uptest', 0), 'uptest'); lp | xmin | xmax | t_ctid | infomask2 | infomask | t_attrs ----+--------+--------+--------+-----------+----------+----------------------- 1 | 385688 | 385689 | (0,2) | 2002 | 100 | {'\x0100','\x2a00'} 2 | 385689 | 0 | (0,2) | 2 | 2800 | {'\x0200','\x2a00'} (2 rows) |
The first entry is the old version of the row, the second the new version.
The lp
is the line pointer number, which stands for the number of the tuple within the data page. Together with the page number, this constitutes the physical address (tuple ID or tid
) of a tuple.
It follows that the t_ctid
stored in the tuple header is usually redundant, since it is implicit in the line pointer. However, it becomes relevant after an UPDATE
: then t_ctid
contains the tuple identifier of the updated version of the row.
This is the “missing link” between the old row version and the updated one!
DELETE
+ DELETE
with “pageinspect”Let's compare this to DELETE
+ INSERT
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
TRUNCATE uptest; INSERT INTO uptest VALUES (1, 42); BEGIN; DELETE FROM uptest WHERE id = 1; INSERT INTO uptest VALUES (2, 42); COMMIT; SELECT lp, t_xmin AS xmin, t_xmax AS xmax, t_ctid, to_hex(t_infomask2) AS infomask2, to_hex(t_infomask) AS infomask, t_attrs FROM heap_page_item_attrs(get_raw_page('uptest', 0), 'uptest'); lp | xmin | xmax | t_ctid | infomask2 | infomask | t_attrs ----+--------+--------+--------+-----------+----------+----------------------- 1 | 385691 | 385692 | (0,1) | 2002 | 100 | {'\x0100','\x2a00'} 2 | 385692 | 0 | (0,2) | 2 | 800 | {'\x0200','\x2a00'} (2 rows) |
Here the t_ctid
column from the old, deleted tuple is unchanged and there is no link to the new tuple. The second tuple is not found by the SELECT ... FOR UPDATE
, since it is “invisible” to the “snapshot” used for scanning the table.
infomask
and infomask2
There are also some relevant differences in the attributes infomask
and infomask2
. You can see the meaning of these flags in the PostgreSQL source file src/include/access/htup_details.h
.
infomask2
: 2 is the number of columns, and HEAP_KEYS_UPDATED
(0x2000) means that the tuple is deleted or updatedinfomask
: HEAP_XMIN_COMMITTED
(0x0100) means that the tuple was valid before it was removed (a hint bit)infomask
: both cases have HEAP_XMAX_INVALID
(0x0800) set (they have not been deleted), but the UPDATE
case also has HEAP_UPDATED
(0x2000), which shows that this is the result of an UPDATE
To understand the difference between UPDATE
and DELETE
+INSERT
, we had a closer look at the tuple headers. We saw infomask
, infomask2
and t_ctid
, where the latter provides the link between the old and the new version of a row.
PostgreSQL's row header occupies 23 bytes, which is more storage overhead than in other databases, but is required for PostgreSQL's special multiversioning and tuple visibility implementation.
UPDATE
can be challenging in PostgreSQL: if you want to read more about its problems and how to deal with them, read my article on HOT update.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply