业务需求优化
1.如何求一段时间内每天的注册人数,并且实现每一天的统计人数都是之前的注册人数加上当天的注册人数
Demo数据如下表,时间范围为2019-01-01至2019-01-03,左边为原始数据,右边为聚合
c_time |
uid |
- |
c_time |
sum |
2019-01-01 |
10001 |
|
2019-01-01 |
1 |
2019-01-02 |
10002 |
|
2019-01-02 |
2 |
2019-01-02 |
10003 |
|
2019-01-03 |
3 |
2019-01-03 |
10004 |
|
|
|
2019-01-03 |
10005 |
|
|
|
2019-01-03 |
10006 |
|
|
|
SQL简单实现
1 2 3 4 5 6 7 8 9 10 11 12
| 实现1: SELECT SUM(IF(c_time = "2019-01-01",sum,0)) AS "1", SUM(IF(c_time BETWEEN "2019-01-01" AND "2019-01-02",sum,0)) AS "2", SUM(IF(c_time BETWEEN "2019-01-01" AND "2019-01-03",sum,0)) AS "3" FROM t 然后将行转成列
实现2: SELECT c_time,sum(sum) over(partition by substr(c_time,1,4) order by c_time) sum FROM t 这个是直接通过开窗函数对数据进行统计
|
Spark SQL简单实现
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
|
val beginDate = "2019-01-01" val endDate = "2019-01-03" val result = tDF .filter(s"c_time >='$beginDate' and c_time < '$endDate'") .selectExpr("uid","c_time") .groupBy("c_time") .agg(count("uid")).as("sum") .selectExpr("sum","dateList(c_time) date_list") .withColumn("c_time",explode($"date_list")) .groupBy("c_time") .agg(sum("sum")).as("sum")
|