【SQL】恶意取消订单用户

10 次阅读 预计阅读时间: 4 分钟


n

问题

nnnn

下面是某电商网站的订单数据,包括order_id,user_id,order_status和operate_time四个字段,我们需要找出所有恶意购买的用户。恶意购买的用户定义是:同一个用户,在任意半小时内(含),取消订单次数>=2次的就被视为恶意买家。比如该样例数据中c用户就是恶意买家。

nnnn
order_id   user_id   order_status    operate_timen1101         a         已支付        2023-01-01 10:00:00n1102         a         已取消        2023-01-01 10:10:00n1103         a         待支付        2023-01-01 10:20:00n1104         b         已取消        2023-01-01 10:30:00n1105         a         待确认        2023-01-01 10:50:00n1106         a         已取消        2023-01-01 11:00:00n1107         b         已取消        2023-01-01 11:40:00n1108         b         已取消        2023-01-01 11:50:00n1109         b         已支付        2023-01-01 12:00:00n1110         b         已取消        2023-01-01 12:11:00n1111         c         已取消        2023-01-01 12:20:00n1112         c         已取消        2023-01-01 12:30:00n1113         c         已取消        2023-01-01 12:55:00n1114         c         已取消        2023-01-01 13:00:00
nnnn

解答

nnnn

典型的滑动窗口的场景。Hive中也是有滑动窗口的功能的(按数据范围开窗,range between and )。针对这个例子,窗口大小就是半小时,然后按每条数据进行滑动,在窗口内判断该条数据对应的用户是否是恶意用户。

nnnn
with temp  as (nselect  1101 as order_id  ,'a'  as user_id  ,"已支付" as order_status,  "2023-01-01 10:00:00"  as operate_timenunion all nselect  1102 as order_id  ,'a'  as user_id  ,"已取消" as order_status,  "2023-01-01 10:10:00"  as operate_timenunion all nselect  1103 as order_id  ,'a'  as user_id  ,"待支付" as order_status,  "2023-01-01 10:20:00"  as operate_timenunion all nselect  1104 as order_id  ,'b'  as user_id  ,"已取消" as order_status,  "2023-01-01 10:30:00"  as operate_timenunion all nselect  1105 as order_id  ,'a'  as user_id  ,"待确认" as order_status,  "2023-01-01 10:50:00"  as operate_timenunion all nselect  1106 as order_id  ,'a'  as user_id  ,"已取消" as order_status,  "2023-01-01 11:00:00"  as operate_timenunion all nselect  1107 as order_id  ,'b'  as user_id  ,"已取消" as order_status,  "2023-01-01 11:40:00"  as operate_timenunion all nselect  1108 as order_id  ,'b'  as user_id  ,"已取消" as order_status,  "2023-01-01 11:50:00"  as operate_timenunion all nselect  1109 as order_id  ,'b'  as user_id  ,"已支付" as order_status,  "2023-01-01 12:00:00"  as operate_timenunion all nselect  1110 as order_id  ,'b'  as user_id  ,"已取消" as order_status,  "2023-01-01 12:11:00"  as operate_timenunion all nselect  1111 as order_id  ,'c'  as user_id  ,"已取消" as order_status,  "2023-01-01 12:20:00"  as operate_timenunion all nselect  1112 as order_id  ,'c'  as user_id  ,"已取消" as order_status,  "2023-01-01 12:30:00"  as operate_timenunion all nselect  1113 as order_id  ,'c'  as user_id  ,"已取消" as order_status,  "2023-01-01 12:55:00"  as operate_timenunion all nselect  1114 as order_id  ,'c'  as user_id  ,"已取消" as order_status,  "2023-01-01 13:00:00"  as operate_timen)n---这里设定tmp表中放的就是上面提供的样例数据nselectndistinct user_idnfromn(n    selectn    order_idn    ,user_idn    ,order_statusn    --通过range between以当前行为锚点,圈定数据范围为operate_time为近30分钟内,然后算该范围内的取消订单数n    ,count(case when order_status='已取消' then order_id end) over(partition by user_id order by operate_time range between 1800 preceding and current row) as cancel_order_cntn    fromn    (n        selectn        order_idn        ,user_idn        ,order_statusn        ,unix_timestamp(operate_time) as operate_time  ---由于range...between只能整数比较,这里先转换为秒n        from tempn    )t1n)t1nwhere cancel_order_cnt>=2n;n----结果nuser_idnbnc
n
最后更新于 2024-03-29