Window function: Why first_value and last_value are not bugs

04.2018 | Category: | Tags: development , function

04.2018

Category:

SQL is a must, if you want to be a Data Analyst or a Data Scientist. However, every once in a while people wonder why a result is the way it is. While on the road in Berlin (Germany) the other day, I found a fairly interesting window function scenario which is pretty counter-intuitive to most people, and which might be worth sharing.

Table of Contents

Window function: What PostgreSQL does and why

PostgreSQL has provided window functions and analytics for quite some time now and this vital feature has been widely adopted by users, who are using PostgreSQL or SQL in general for more than just trivial queries. A modern database is just so much more than a simple data store and window functions are therefore certainly something to look into.

So, what is this post all about? Here is an example:

test=# SELECT *, 
 first_value(x) OVER (ORDER BY x) 
 FROM generate_series(1, 5) AS x;
 x | first_value 
---+-------------
 1 | 1
 2 | 1
 3 | 1
 4 | 1
 5 | 1
(5 rows)

1

2

3

4

5

6

7

8

9

10

11

test=# SELECT *,

first_value(x) OVER (ORDER BY x)

FROM generate_series(1, 5) AS x;

x | first_value

---+-------------

1 | 1

2 | 1

3 | 1

4 | 1

5 | 1

(5 rows)

What we want, is the first value in our data set. The ORDER BY clause will ensure that data is fed to first_value in the right order. The result is therefore not surprising.

The same applies if we add `DESC` to our `ORDER BY`. The result is totally obvious:

test=# SELECT *, 
         first_value(x) OVER (ORDER BY x),
         first_value(x) OVER (ORDER BY x DESC)
   FROM generate_series(1, 5) AS x;
 x | first_value | first_value 
---+-------------+-------------
 5 |           1 |           5
 4 |           1 |           5
 3 |           1 |           5
 2 |           1 |           5
 1 |           1 |           5
(5 rows)

1

2

3

4

5

6

7

8

9

10

11

12

test=# SELECT *,

first_value(x) OVER (ORDER BY x),

first_value(x) OVER (ORDER BY x DESC)

FROM generate_series(1, 5) AS x;

x | first_value | first_value

---+-------------+-------------

5 | 1 | 5

4 | 1 | 5

3 | 1 | 5

2 | 1 | 5

1 | 1 | 5

(5 rows)

Window function - last_value: Unexpected results

However, what if we use last_value. Here is an example:

test=# SELECT *, 
         last_value(x) OVER (ORDER BY x),
         last_value(x) OVER (ORDER BY x DESC)
       FROM generate_series(1, 5) AS x;
 x | last_value | last_value 
---+------------+------------
 5 |          5 |          5
 4 |          4 |          4
 3 |          3 |          3
 2 |          2 |          2
 1 |          1 |          1
(5 rows)

1

2

3

4

5

6

7

8

9

10

11

12

test=# SELECT *,

last_value(x) OVER (ORDER BY x),

last_value(x) OVER (ORDER BY x DESC)

FROM generate_series(1, 5) AS x;

x | last_value | last_value

---+------------+------------

5 | 5 | 5

4 | 4 | 4

3 | 3 | 3

2 | 2 | 2

1 | 1 | 1

(5 rows)

What you can see here is that both columns will return the SAME data – regardless of the different sort order provided by the ORDER BY clause. That comes as a surprise to most people. Actually most people would accept one column to contain only “5” and the other column to contain only “1”.

Why is that not the case? Here is the answer:

test=# SELECT *, 
         array_agg(x) OVER (ORDER BY x),
         array_agg(x) OVER (ORDER BY x DESC)
       FROM generate_series(1, 5) AS x;
 x | array_agg   | array_agg 
---+-------------+-------------
 5 | {1,2,3,4,5} | {5}
 4 | {1,2,3,4}   | {5,4}
 3 | {1,2,3}     | {5,4,3}
 2 | {1,2}       | {5,4,3,2}
 1 | {1}         | {5,4,3,2,1}
(5 rows)

1

2

3

4

5

6

7

8

9

10

11

12

test=# SELECT *,

array_agg(x) OVER (ORDER BY x),

array_agg(x) OVER (ORDER BY x DESC)

FROM generate_series(1, 5) AS x;

x | array_agg | array_agg

---+-------------+-------------

5 | {1,2,3,4,5} | {5}

4 | {1,2,3,4} | {5,4}

3 | {1,2,3} | {5,4,3}

2 | {1,2} | {5,4,3,2}

1 | {1} | {5,4,3,2,1}

(5 rows)

Let us take a look at which values last_value will actually see: array_agg will simply put them all into an array so that we can expect things in detail. As you can see, the last value in the array is identical in both cases, which means that both columns will produce exactly the identical output.

4 responses to “Window function: Why first_value and last_value are not bugs”

Dmitry Dorofeev says:

April 2, 2018 at 10:22 am

Funny enough:

SELECT *,
last_value(x) OVER (ORDER BY x DESC),
last_value(x) OVER (ORDER BY x)
FROM generate_series(1, 5) AS x;

would result with:

x | last_value | last_value
--- ------------ ------------
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
4 | 4 | 4
5 | 5 | 5
(5 rows)

Reply
Colin 't Hart says:

April 2, 2018 at 10:58 am

You should mention the frame_clause, and how it defaults to RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The reason that this is the default is to facilitate running totals and the like, which are very common uses for windowing functions.

Making the default RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING would have been far less useful and made more common cases require far more verbose SQL.

Reply
- Chad Capra says:
  
  October 10, 2018 at 9:48 pm
  
  Hans, this is a great article and what you explain is "spot on," which is great! However, Colin's response is EXACTLY the "reason" behind why postgres behaves this way. I wish the default was "UNBOUND PRECEDING AND UNBOUNDED FOLLOWING" because having windows functions "choose" default filtering is actually more confusing, IMO.
  
  The postgres team made a decision based on what their experience has been instead of sticking to the facts that a window function is like a "join" to table itself and in other "JOINs," the table/view is not automatically filtered. I cannot think of the exact scenario, but I feel like this would be similar to filtering NULLs in a LEFT JOIN (because it is a "common" thing to do).
  
  No disrespect to the postgres team as they have made an amazing product that I will continue to use for a long time, but this decision to implement a pattern that by default excludes rows from a table/view is certainly confusing because other patterns do not seem to go down this path.
  
  In any case, this article lead me to the explanation of my exact problem (i.e. attempting to find the latest appointment date for a client using windows functions).
  
  For that, I applaud you both!!
  
  Reply
taitmann says:

February 12, 2019 at 3:30 pm

This ia good tutorial. Thanks. But i have a different situation as follows;
I want to select and get Aquila with 20 as the last value under the marks colum and omit the null rows.
How do i archive this? Please help.

My table looks like this.
.....................................................................
Student --------------Mark
.....................................................................

1. John ----------------25

2. Peter ---------------30

3. Jack -------------- Null

4. Aquila --------------20

5. Judy ----------------null

6. Kessy --------------null

Reply

Window function: Why first_value and last_value are not bugs

Window function: What PostgreSQL does and why

So, what is this post all about? Here is an example:

The same applies if we add `DESC` to our `ORDER BY`. The result is totally obvious:

Window function - last_value: Unexpected results

Why is that not the case? Here is the answer:

4 responses to “Window function: Why first_value and last_value are not bugs”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Window function: Why first_value and last_value are not bugs

Window function: What PostgreSQL does and why

So, what is this post all about? Here is an example:

The same applies if we add DESC to our ORDER BY. The result is totally obvious:

Window function - last_value: Unexpected results

Why is that not the case? Here is the answer:

4 responses to “Window function: Why first_value and last_value are not bugs”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

The same applies if we add `DESC` to our `ORDER BY`. The result is totally obvious: