问题
有如下访问网站的数据,包括用户id和访问时间两个字段。如果某个用户的连续的访问记录时间间隔小于60秒,则属于同一个会话,现在需要计算每个用户有多少个会话。
比如A用户在第1秒,60秒,200秒,230秒有三次访问记录,则该用户有2个会话,其中第一个会话是第1秒和第60秒的记录,第二个会话是第200秒和230秒的记录。
| user_id ts |
| 1001 16920000000 |
| 1001 16920000050 |
| 1002 16920000065 |
| 1002 16920000080 |
| 1001 16920000150 |
| 1002 16920000160 |
解答
user_id | ts | 判断与上一行差值是否小于60 | 开窗累加当做会话编号 |
---|
A | 1 | 0 | 0 |
A | 60 | 0 | 0 |
A | 200 | 1 | 1 |
A | 230 | 0 | 1 |
| with tmp as ( |
| select 1001 as user_id,16920000000 as ts |
| union all |
| select 1001 as user_id,16920000050 as ts |
| union all |
| select 1002 as user_id,16920000065 as ts |
| union all |
| select 1002 as user_id,16920000080 as ts |
| union all |
| select 1001 as user_id,16920000150 as ts |
| union all |
| select 1002 as user_id,16920000160 as ts |
| ) |
| |
| select |
| user_id |
| ,count(distinct user_group) as user_group_cnt |
| from |
| ( |
| select |
| user_id |
| ,ts |
| |
| ,sum(flag) over(partition by user_id order by ts) as user_group |
| from |
| ( |
| select |
| user_id |
| ,ts |
| |
| ,if(ts-last_ts<60,0,1) as flag |
| from |
| ( |
| select |
| user_id |
| ,ts |
| |
| ,lag(ts,1,ts) over(partition by user_id order by ts) as last_ts |
| from tmp |
| )t1 |
| )t1 |
| )t1 |
| group by user_id; |
Comments NOTHING