Today somebody asked me how to remove duplicates which accidentally made it into a table. The problem is: A normal DELETE won't do, because you would delete both values - not just the one which is in there twice.
To solve the problem, you have to use a "secret" column called "ctid". The "ctid" identifies a row inside a table. Here is an example:
1 2 3 4 5 |
test=# CREATE TABLE t_test (idint4); CREATE TABLE test=# INSERT INTO t_test VALUES (1) (2) (3); idint4 -> t_test (id int4); |
As you can see two values show up twice. To find out how we can remove the duplicate value we can query the "ctid":
1 2 3 4 5 6 7 8 |
test=# SELECT ctid, * FROM t_test; ctid | id ______+___ (0,1) | 1 (0,2) | 2 (0,3) | 2 (0,4) | 3 (4 rows) |
We can make use of the fact that the ctid is not the same for our values. The subselect will check for the lowest ctid for a given value and delete it:
1 2 3 4 5 6 7 8 9 10 11 |
test=# DELETE FROM t_test WHERE ctid IN (SELECT min(ctid) FROM t_test GROUP BY id HAVING count(*) > 1 ) RETURNING * ; id ___ 2 (1 row) |
This query works nicely if we can rely on the fact that we only got values which don't show up more often than twice. If we want to do things in a generic way, we can use a simple windowing function to make things work:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
test=# SELECT ctid FROM (SELECT ctid, id, count(*) OVER (PARTITION BY id ORDER BY ctid) FROM t_test ) AS x WHERE count > 1; ctid ------ (0,3) (0,4) (2 rows) test=# DELETE FROM t_test WHERE ctid IN (SELECT ctid) FROM (SELECT ctid, id, count(*) OVER (PARTITION BY id ORDER BY ctid) FROM t_test ) AS x WHERE count > 1 ) RETURNING * ; id ---- 2 2 (2 rows) |
Now we can check for the result:
1 2 3 4 5 6 7 |
test=# SELECT ctid, * FROM t_test; ctid | id ______+___ (0,1) | 1 (0,2) | 2 (0,5) | 3 (3 rows) |
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Just a quick note here. Window Functions appeared first in PostgreSQL 8.4.
Finally a cristal-clear explanation.... thank you so much!