In my role as a PostgreSQL consultant and trainer there are a couple of issues, which pop up on a regular basis. One of those issues has to do with set operations. Many people seem to misunderstand the way UNION and UNION ALL work.
UPDATE Sept. 2023: See this post on UNION ALL
for more recent information.
The basic assumption of most users is that UNION just appends data – nothing can be further from the truth. Here is why:
1 2 3 4 5 |
test=# SELECT 1 UNION SELECT 1; ?column? ---------- 1 (1 row) |
Most people would expect that two rows will be returned from this query. In fact, it is just one row. The reason for that is simple. UNION does NOT just append data – it also removes duplicates. As we have two identical numbers in our example, only one row will remain.
We can use the explain command to see how this operation is performed internally:
1 2 3 4 5 6 7 8 9 10 |
test=# explain SELECT 1 UNION SELECT 1; QUERY PLAN ------------------------------------------------------------ Unique (cost=0.05..0.06 rows=2 width=0) -> Sort (cost=0.05..0.06 rows=2 width=0) Sort Key: (1) -> Append (cost=0.00..0.04 rows=2 width=0) -> Result (cost=0.00..0.01 rows=1 width=0) -> Result (cost=0.00..0.01 rows=1 width=0) (6 rows) |
PostgreSQL will sort the data and remove duplicates later on. The way PostgreSQL handles this query also explains, why many users report performance problems when using UNION. PostgreSQL has to sort all the data to remove duplicate – this is clearly more expensive than just adding data to some result.
What people in many cases really want is UNION ALL.
Let us take a look at an example:
1 2 3 4 5 6 |
test=# SELECT 1 UNION ALL SELECT 1; ?column? ---------- 1 1 (2 rows) |
In this case data is really just appended and therefore we get two lines of data.
The plan will reveal that things are really the way we want them to be:
1 2 3 4 5 6 7 8 |
test=# explain SELECT 1 UNION ALL SELECT 1; QUERY PLAN ------------------------------------------------------ Result (cost=0.00..0.04 rows=2 width=4) -> Append (cost=0.00..0.04 rows=2 width=4) -> Result (cost=0.00..0.01 rows=1 width=0) -> Result (cost=0.00..0.01 rows=1 width=0) (4 rows) |
PostgreSQL will take both result sets and just add them up.
So be careful when it comes to set operations and think twice if you really want to filter duplicates or not.
----------
We welcome your feedback on our brand new Facebook page: www.fb.com/cybertec.postgresql
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information