Flexible grouping: Some dirty SQL trickery

Some dirty SQL trickery

The first thing to do is to check out those difference from one timestamp to the next:

SELECT *, lag(t, 1) OVER (ORDER BY t)
FROM    t_data;

1 2	SELECT *, lag(t, 1) OVER (ORDER BY t) FROM t_data;

The lag function offers a nice way to solve this kind of problem:

    t     | val |   lag
----------+-----+----------
 14:00:00 |  12 |
 14:01:00 |  22 | 14:00:00
 14:01:00 |  43 | 14:01:00
 14:14:00 |  32 | 14:01:00
 14:15:00 |  33 | 14:14:00
 14:16:00 |  27 | 14:15:00
 14:30:00 |  19 | 14:16:00
(7 rows)

t | val | lag

----------+-----+----------

14:00:00 | 12 |

14:01:00 | 22 | 14:00:00

14:01:00 | 43 | 14:01:00

14:14:00 | 32 | 14:01:00

14:15:00 | 33 | 14:14:00

14:16:00 | 27 | 14:15:00

14:30:00 | 19 | 14:16:00

(7 rows)

Now that we have used lag to "move" the time to the next row, there is a simple trick which can be applied:

SELECT  *, CASE WHEN t - lag < '10 minutes'
           THEN currval('seq_a')
           ELSE nextval('seq_a') END AS g
FROM    ( SELECT *, lag(t, 1) OVER (ORDER BY t)
          FROM  t_data
        ) AS x;

SELECT *, CASE WHEN t - lag < '10 minutes'

THEN currval('seq_a')

ELSE nextval('seq_a') END AS g

FROM ( SELECT *, lag(t, 1) OVER (ORDER BY t)

FROM t_data

) AS x;

Moving the lag to a subselect allows us to start all over again and to create those groups. The trick now is: If the difference from one line to the next is high, start a new group - otherwise stay within the group.

This leaves us with a simple result set:

    t     | val |   lag    | g
----------+-----+----------+---
 14:00:00 |  12 |          | 1
 14:01:00 |  22 | 14:00:00 | 1
 14:01:00 |  43 | 14:01:00 | 1
 14:14:00 |  32 | 14:01:00 | 2
 14:15:00 |  33 | 14:14:00 | 2
 14:16:00 |  27 | 14:15:00 | 2
 14:30:00 |  19 | 14:16:00 | 3
(7 rows)

t | val | lag | g

----------+-----+----------+---

14:00:00 | 12 | | 1

14:01:00 | 22 | 14:00:00 | 1

14:01:00 | 43 | 14:01:00 | 1

14:14:00 | 32 | 14:01:00 | 2

14:15:00 | 33 | 14:14:00 | 2

14:16:00 | 27 | 14:15:00 | 2

14:30:00 | 19 | 14:16:00 | 3

(7 rows)

From now on, life is easy. We can take this output and quickly aggregate on this data. "GROUP BY g" will give us nice groups for each value of "g".

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

4 responses to “Flexible grouping: Some dirty SQL trickery”

Corey Huinker says:

November 30, 2015 at 5:04 pm

Here's a way to do that without sequences:

select x.*, sum(edge) over (order by t) as group_num from (select *, case when (t - lag(t,1) over (order by t)) >= '10 minutes' then 1 else 0 end as edge from t_data) x order by t;
t | val | edge | group_num ---------- ----- ------ ----------- 14:00:00 | 12 | 0 | 0 14:01:00 | 22 | 0 | 0 14:01:00 | 43 | 0 | 0 14:14:00 | 32 | 1 | 1 14:15:00 | 33 | 0 | 1 14:16:00 | 27 | 0 | 1 14:30:00 | 19 | 1 | 2 (7 rows)

Reply
- Hans-Jürgen Schönig says:
  
  December 1, 2015 at 8:25 am
  
  this is a cool one as well. we used the sequence because you can run jobs continuously.
  i like your query :). i got to remember that one.
  
  Reply
  - Corey Huinker says:
    
    December 1, 2015 at 11:06 pm
    
    Ah, didn't realize you wanted the group_num to be unique across all queries, not just within this query.
    
    Reply
    - Hans-Jürgen Schönig says:
      
      December 2, 2015 at 8:09 am
      
      in this business case yes. but, your query is pretty cool as well :). maybe i just communicated badly.
      
      Reply

Flexible grouping: Some dirty SQL trickery

Some dirty SQL trickery

4 responses to “Flexible grouping: Some dirty SQL trickery”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts