CYBERTEC PostgreSQL Logo

Removing duplicates in PostgreSQL

12.2012 / Category: / Tags:

Today somebody asked me how to remove duplicates which accidentally made it into a table. The problem is: A normal DELETE won't do, because you would delete both values - not just the one which is in there twice.

The magic word is "ctid"

To solve the problem, you have to use a "secret" column called "ctid". The "ctid" identifies a row inside a table. Here is an example:

 As you can see two values show up twice. To find out how we can remove the duplicate value we can query the "ctid":

We can make use of the fact that the ctid is not the same for our values. The subselect will check for the lowest ctid for a given value and delete it:

This query works nicely if we can rely on the fact that we only got values which don't show up more often than twice. If we want to do things in a generic way, we can use a simple windowing function to make things work:

Now we can check for the result:


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

2 responses to “Removing duplicates in PostgreSQL”

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram