HIve数仓新零售项目DWS层的构建(Grouping sets)模型

2023-11-02 17:10

本文主要是介绍HIve数仓新零售项目DWS层的构建(Grouping sets)模型,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

HIve数仓新零售项目

注:大家觉得博客好的话,别忘了点赞收藏呀,本人每周都会更新关于人工智能和大数据相关的内容,内容多为原创,Python Java Scala SQL 代码,CV NLP 推荐系统等,Spark Flink Kafka Hbase Hive Flume等等~写的都是纯干货,各种顶会的论文解读,一起进步。
今天继续和大家分享一下HIve数仓新零售项目
#博学谷IT学习技术支持


文章目录

  • HIve数仓新零售项目
  • 前言
  • 一、Grouping sets 模型介绍
  • 二、DWS层功能与职责
  • 三、销售主题统计宽表
    • 1.构建目标表
    • 2.Presto Grouping sets语法实现
  • 总结


前言

在这里插入图片描述
在这里插入图片描述
这是一个线下真实HIve数仓的一个搭建项目,还是比较复杂的,主要和大家一起分享一下整个HIve数仓的思路。
整个项目分为:
1.ODS层
2.DWD层
3.DWB层
4.DWS层
5.DM层
6.RPT层
每一层都有每一层的知识点。我会和大家分享从数据源MySQL开始,如何搭建整个完整的项目。


一、Grouping sets 模型介绍

一种高效的替代多个UNION ALL语法的模型,个人比较喜欢,非常灵活,速度快。
以下是一个demo案例。
需求:
分别按照(month)、(day)、月和天(month,day)统计来访用户userid个数,并获取三者的结果集(一起插入到目标宽表中)。

create table test.t_user(month string, day string, userid string
) 
row format delimited fields terminated by ',';--数据样例
2015-03,2015-03-10,user1
2015-03,2015-03-10,user5
2015-03,2015-03-12,user7
2015-04,2015-04-12,user3
2015-04,2015-04-13,user2
2015-04,2015-04-13,user4
2015-04,2015-04-16,user4
2015-03,2015-03-10,user2
2015-03,2015-03-10,user3
2015-04,2015-04-12,user5
2015-04,2015-04-13,user6
2015-04,2015-04-15,user3
2015-04,2015-04-15,user2
2015-04,2015-04-16,user1
  • UNION ALL 写法 ,比较丑而且速度慢,效率低
--3个分组统计而已,简单。统计完再使用union all合并结果集。
--注意union all合并结果集需要各个查询返回字段个数、类型一致,因此需要合理的使用null来填充返回结果。
select month,null,count(userid)
from test.t_user
group by monthunion allselect null,day,count(userid)
from test.t_user
group by dayunion allselect month,day,count(userid)
from test.t_user
group by month,day;

在这里插入图片描述

  • grouping sets模型写法
  • 根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UNION ALL。
-- Hive的写法
select month,day,count(userid) 
from test.t_user group by month,day 
grouping sets (month,day,(month,day));-- presto的写法
select
month,
day,
count(*) as cnt
from  test.t_user
group by
grouping sets (month,day,(month,day))
  • grouping介绍与使用
  • 功能:使用grouping操作来判断当前数据是按照哪个字段来分组的
  • 对于给定的分组,如果分组中包含相应的列,则将位设置为0,否则将其设置为1
select month,day,count(userid),grouping(month)      as m,grouping(day)        as d,grouping(month, day) as m_d
from test.t_user
group bygrouping sets (month, day, (month, day));

二、DWS层功能与职责

DWS层: 基于主题统计分析, 此层一般是用于最细粒度的统计操作

  • 维度组合:

    日期
    日期+城市
    日期+城市+商圈
    日期+城市+商圈+店铺
    日期+品牌
    日期+大类
    日期+大类+中类
    日期+大类+中列+小类

  • 指标:
    销售收入、平台收入、配送成交额、小程序成交额、安卓APP成交额、苹果APP成交额、PC商城成交额、订单量、参 评单量、差评单量、配送单量、退款单量、小程序订单量、安卓APP订单量、苹果APP订单量、PC商城订单量。

三、销售主题统计宽表

最终要求通过group_type来判断指标来自哪个维度的聚合
在这里插入图片描述

1.构建目标表

drop database if exists yp_dws
create database if not exists yp_dws;-- 销售主题日统计宽表
DROP TABLE IF EXISTS yp_dws.dws_sale_daycount;
CREATE TABLE yp_dws.dws_sale_daycount(city_id string COMMENT '城市id',city_name string COMMENT '城市name',trade_area_id string COMMENT '商圈id',trade_area_name string COMMENT '商圈名称',store_id string COMMENT '店铺的id',store_name string COMMENT '店铺名称',brand_id string COMMENT '品牌id',brand_name string COMMENT '品牌名称',max_class_id string COMMENT '商品大类id',max_class_name string COMMENT '大类名称',mid_class_id string COMMENT '中类id',mid_class_name string COMMENT '中类名称',min_class_id string COMMENT '小类id',min_class_name string COMMENT '小类名称',-- 经验字段: 用于标记每一条数据是按照哪个维度计算出来的group_type string COMMENT '分组类型:store,trade_area,city,brand,min_class,mid_class,max_class,all',--   =======日统计=======--   销售收入sale_amt DECIMAL(38,2) COMMENT '销售收入',--   平台收入plat_amt DECIMAL(38,2) COMMENT '平台收入',-- 配送成交额deliver_sale_amt DECIMAL(38,2) COMMENT '配送成交额',-- 小程序成交额mini_app_sale_amt DECIMAL(38,2) COMMENT '小程序成交额',-- 安卓APP成交额android_sale_amt DECIMAL(38,2) COMMENT '安卓APP成交额',--  苹果APP成交额ios_sale_amt DECIMAL(38,2) COMMENT '苹果APP成交额',-- PC商城成交额pcweb_sale_amt DECIMAL(38,2) COMMENT 'PC商城成交额',-- 成交单量order_cnt BIGINT COMMENT '成交单量',-- 参评单量eva_order_cnt BIGINT COMMENT '参评单量comment=>cmt',-- 差评单量bad_eva_order_cnt BIGINT COMMENT '差评单量negtive-comment=>ncmt',-- 配送成交单量deliver_order_cnt BIGINT COMMENT '配送单量',-- 退款单量refund_order_cnt BIGINT COMMENT '退款单量',-- 小程序成交单量miniapp_order_cnt BIGINT COMMENT '小程序成交单量',-- 安卓APP订单量android_order_cnt BIGINT COMMENT '安卓APP订单量',-- 苹果APP订单量ios_order_cnt BIGINT COMMENT '苹果APP订单量',-- PC商城成交单量pcweb_order_cnt BIGINT COMMENT 'PC商城成交单量'
)
COMMENT '销售主题日统计宽表'
PARTITIONED BY(dt STRING)
ROW format delimited fields terminated BY '\t'
stored AS orc tblproperties ('orc.compress' = 'SNAPPY');

2.Presto Grouping sets语法实现

insert into yp_dws.dws_sale_daycount
with t0 as (select-- 列裁剪-- 维度字段od.dt,city_id,city_name,trade_area_id,trade_area_name,store_name,brand_id,brand_name,max_class_name,max_class_id,mid_class_name,mid_class_id,min_class_name,min_class_id,-- 指标字段order_id,order_amount,total_price,plat_fee,delivery_fee,order_from,evaluation_id,geval_scores,delievery_id,refund_id,od.store_id,row_number() over (partition by order_id,goods_id ) as rk1, -- 过滤脏数据row_number() over (partition by order_id ) as rk2from yp_dwb.dwb_order_detail odleft join  yp_dwb.dwb_shop_detail  sd on od.store_id = sd.idleft join  yp_dwb.dwb_goods_detail gd on od.goods_id = gd.id)
selectcity_id,city_name,trade_area_id,trade_area_name,store_id,store_name,brand_id,brand_name,max_class_id,max_class_name,mid_class_id,mid_class_name,min_class_id,min_class_name,case when grouping(store_id) = 0      -- ifthen 'store'  -- 日期 + 城市 + 商圈 + 店铺when grouping(trade_area_id) = 0  -- else ifthen 'trade_area'  --日期 + 城市 + 商圈when grouping(city_id) = 0   -- else ifthen 'city '      --日期 + 城市when grouping(brand_id) = 0  -- else ifthen 'brand'  -- 日期 = 品牌when grouping(min_class_id) = 0  -- else ifthen 'min_class'  -- 日期 + 大类 + 中类 + 小类when grouping(mid_class_id) = 0   -- else ifthen 'mid_class'  -- 日期 + 大类 + 中类when grouping(max_class_id) = 0then 'max_clas'  -- 日期 + 大类else'all'  -- 日期end as group_type,-- 总销售额case when grouping(store_id) = 0then sum(if(store_id is not null,total_price,0))when grouping(trade_area_id) = 0then sum(if(trade_area_id is not null,total_price,0))when  grouping(city_id) = 0then sum(if(city_id is not null,total_price,0))when grouping(brand_id) = 0then sum(if(brand_id is not null,total_price,0))when grouping(min_class_id) = 0then sum(if(min_class_id is not null,total_price,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null,total_price,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null,total_price,0))   -- 聚合定制when grouping(max_class_id) = 0then sum(if(max_class_id is not null,total_price,0))elsesum(if(dt is not null,total_price,0))  -- 日期end as sale_amt,-- 平台收入case when grouping(store_id) = 0then sum(if(store_id is not null,plat_fee,0))when grouping(trade_area_id) = 0then sum(if(trade_area_id is not null,plat_fee,0))when  grouping(city_id) = 0then sum(if(city_id is not null,plat_fee,0))when grouping(brand_id) = 0then sum(if(brand_id is not null,plat_fee,0))when grouping(min_class_id) = 0then sum(if(min_class_id is not null,plat_fee,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null,plat_fee,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null,plat_fee,0))   -- 聚合定制when grouping(max_class_id) = 0then sum(if(max_class_id is not null,plat_fee,0))elsesum(if(dt is not null,plat_fee,0))  -- 日期end as plat_amt,-- 配送成交额case when grouping(store_id) = 0then sum(if(store_id is not null and delievery_id is not null,total_price,0))when grouping(trade_area_id) = 0then sum(if(trade_area_id is not null  and delievery_id is not null,total_price,0))when  grouping(city_id) = 0then sum(if(city_id is not null  and delievery_id is not null,total_price,0))when grouping(brand_id) = 0then sum(if(brand_id is not null  and delievery_id is not null,total_price,0))when grouping(min_class_id) = 0then sum(if(min_class_id is not null  and delievery_id is not null,total_price,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null  and delievery_id is not null,total_price,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null  and delievery_id is not null,total_price,0))   -- 聚合定制when grouping(max_class_id) = 0then sum(if(max_class_id is not null  and delievery_id is not null,total_price,0))elsesum(if(dt is not null  and delievery_id is not null,total_price,0))  -- 日期end as deliver_sale_amt,-- 小程序成交额case when grouping(store_id) = 0then sum(if(store_id is not null and order_from = 'miniapp',total_price,0))when grouping(trade_area_id) = 0then sum(if(trade_area_id is not null and order_from = 'miniapp',total_price,0))when  grouping(city_id) = 0then sum(if(city_id is not null  and order_from = 'miniapp',total_price,0))when grouping(brand_id) = 0then sum(if(brand_id is not null  and order_from = 'miniapp',total_price,0))when grouping(min_class_id) = 0then sum(if(min_class_id is not null  and order_from = 'miniapp',total_price,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null  and order_from = 'miniapp',total_price,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null  and order_from = 'miniapp',total_price,0))   -- 聚合定制when grouping(max_class_id) = 0then sum(if(max_class_id is not null  and order_from = 'miniapp',total_price,0))elsesum(if(dt is not null  and order_from = 'miniapp',total_price,0))  -- 日期end as mini_app_sale_amt,-- android成交额case when grouping(store_id) = 0then sum(if(store_id is not null and order_from = 'android',total_price,0))when grouping(trade_area_id) = 0then sum(if(trade_area_id is not null and order_from = 'android',total_price,0))when  grouping(city_id) = 0then sum(if(city_id is not null  and order_from = 'android',total_price,0))when grouping(brand_id) = 0then sum(if(brand_id is not null  and order_from = 'android',total_price,0))when grouping(min_class_id) = 0then sum(if(min_class_id is not null  and order_from = 'android',total_price,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null  and order_from = 'android',total_price,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null  and order_from = 'android',total_price,0))   -- 聚合定制when grouping(max_class_id) = 0then sum(if(max_class_id is not null  and order_from = 'android',total_price,0))elsesum(if(dt is not null  and order_from = 'android',total_price,0))  -- 日期end as android_sale_amt,-- ios成交额case when grouping(store_id) = 0then sum(if(store_id is not null and order_from = 'ios',total_price,0))when grouping(trade_area_id) = 0then sum(if(trade_area_id is not null and order_from = 'ios',total_price,0))when  grouping(city_id) = 0then sum(if(city_id is not null  and order_from = 'ios',total_price,0))when grouping(brand_id) = 0then sum(if(brand_id is not null  and order_from = 'ios',total_price,0))when grouping(min_class_id) = 0then sum(if(min_class_id is not null  and order_from = 'ios',total_price,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null  and order_from = 'ios',total_price,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null  and order_from = 'ios',total_price,0))   -- 聚合定制when grouping(max_class_id) = 0then sum(if(max_class_id is not null  and order_from = 'ios',total_price,0))elsesum(if(dt is not null  and order_from = 'ios',total_price,0))  -- 日期end as ios_sale_amt,-- pcweb成交额case when grouping(store_id) = 0then sum(if(store_id is not null and order_from = 'pcweb',total_price,0))when grouping(trade_area_id) = 0then sum(if(trade_area_id is not null and order_from = 'pcweb',total_price,0))when  grouping(city_id) = 0then sum(if(city_id is not null  and order_from = 'pcweb',total_price,0))when grouping(brand_id) = 0then sum(if(brand_id is not null  and order_from = 'pcweb',total_price,0))when grouping(min_class_id) = 0then sum(if(min_class_id is not null  and order_from = 'pcweb',total_price,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null  and order_from = 'pcweb',total_price,0))when grouping(mid_class_id) = 0then sum(if(mid_class_id is not null  and order_from = 'pcweb',total_price,0))   -- 聚合定制when grouping(max_class_id) = 0then sum(if(max_class_id is not null  and order_from = 'pcweb',total_price,0))elsesum(if(dt is not null  and order_from = 'pcweb',total_price,0))  -- 日期end as pcweb_sale_amt,-- 成交单量case when grouping(store_id) = 0then count(if(store_id is not null and rk2 = 1,order_id,null))when grouping(trade_area_id) = 0then count(if(trade_area_id is not null and rk2 = 1,order_id,null))when  grouping(city_id) = 0then count(if(city_id is not null and rk2=1,order_id,null))when grouping(brand_id) = 0then count(if(brand_id is not null and rk2=1,order_id,null))when grouping(min_class_id) = 0then count(if(min_class_id is not null and rk2=1,order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1,order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1,order_id,null))   -- 聚合定制when grouping(max_class_id) = 0then count(if(max_class_id is not null and rk2=1,order_id,null))elsecount(if(dt is not null and rk2=1,order_id,null))  -- 日期end as order_cnt,-- 参评单量case when grouping(store_id) = 0then count(if(store_id is not null and rk2=1 and evaluation_id is not null and evaluation_id is not null,order_id,null))when grouping(trade_area_id) = 0then count(if(trade_area_id is not null and rk2=1 and evaluation_id is not null,order_id,null))when  grouping(city_id) = 0then count(if(city_id is not null and rk2=1 and evaluation_id is not null,order_id,null))when grouping(brand_id) = 0then count(if(brand_id is not null and rk2=1 and evaluation_id is not null,order_id,null))when grouping(min_class_id) = 0then count(if(min_class_id is not null and rk2=1 and evaluation_id is not null,order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1 and evaluation_id is not null,order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1 and evaluation_id is not null,order_id,null))   -- 聚合定制when grouping(max_class_id) = 0then count(if(max_class_id is not null and rk2=1 and evaluation_id is not null,order_id,null))elsecount(if(dt is not null and rk2=1 and evaluation_id is not null,order_id,null))  -- 日期end as eva_order_cnt,-- 差评单量case when grouping(store_id) = 0then count(if(store_id is not null and rk2=1 and evaluation_id is not null and geval_scores <= 6,order_id,null))when grouping(trade_area_id) = 0then count(if(trade_area_id is not null and rk2=1 and evaluation_id is not null and geval_scores <= 6,order_id,null))when  grouping(city_id) = 0then count(if(city_id is not null and rk2=1 and evaluation_id is not null and geval_scores <= 6,order_id,null))when grouping(brand_id) = 0then count(if(brand_id is not null and rk2=1 and evaluation_id is not null and geval_scores <= 6,order_id,null))when grouping(min_class_id) = 0then count(if(min_class_id is not null and rk2=1 and evaluation_id is not null and geval_scores <= 6,order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1 and evaluation_id is not null and geval_scores <= 6,order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1 and evaluation_id is not null and geval_scores <= 6,order_id,null))   -- 聚合定制when grouping(max_class_id) = 0then count(if(max_class_id is not null and rk2=1 and evaluation_id is not null and geval_scores <= 6,order_id,null))elsecount(if(dt is not null and rk2=1 and evaluation_id is not null and geval_scores <= 6,order_id,null))  -- 日期end as bad_eva_order_cnt,-- 配送单量case when grouping(store_id) = 0then count(if(store_id is not null and rk2=1 and delievery_id is not null,order_id,null))when grouping(trade_area_id) = 0then count(if(trade_area_id is not null and rk2=1 and delievery_id is not null,order_id,null))when  grouping(city_id) = 0then count(if(city_id is not null and rk2=1 and delievery_id is not null,order_id,null))when grouping(brand_id) = 0then count(if(brand_id is not null and rk2=1 and delievery_id is not null,order_id,null))when grouping(min_class_id) = 0then count(if(min_class_id is not null and rk2=1 and delievery_id is not null,order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1 and delievery_id is not null,order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1 and delievery_id is not null,order_id,null))   -- 聚合定制when grouping(max_class_id) = 0then count(if(max_class_id is not null and rk2=1 and delievery_id is not null,order_id,null))elsecount(if(dt is not null and rk2=1 and delievery_id is not null,order_id,null))  -- 日期end as deliver_order_cnt,-- 退款单量case when grouping(store_id) = 0then count(if(store_id is not null and rk2=1 and refund_id is not null,order_id,null))when grouping(trade_area_id) = 0then count(if(trade_area_id is not null and rk2=1 and refund_id is not null,order_id,null))when  grouping(city_id) = 0then count(if(city_id is not null and rk2=1 and refund_id is not null,order_id,null))when grouping(brand_id) = 0then count(if(brand_id is not null and rk2=1 and refund_id is not null,order_id,null))when grouping(min_class_id) = 0then count(if(min_class_id is not null and rk2=1 and refund_id is not null,order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1 and refund_id is not null,order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1 and refund_id is not null,order_id,null))   -- 聚合定制when grouping(max_class_id) = 0then count(if(max_class_id is not null and rk2=1 and refund_id is not null,order_id,null))elsecount(if(dt is not null and rk2=1 and refund_id is not null,order_id,null))  -- 日期end as refund_order_cnt,-- 小程序成交单量case when grouping(store_id) = 0then count(if(store_id is not null and rk2=1 and order_from = 'miniapp',order_id,null))when grouping(trade_area_id) = 0then count(if(trade_area_id is not null and rk2=1 and order_from = 'miniapp',order_id,null))when  grouping(city_id) = 0then count(if(city_id is not null and rk2=1  and order_from = 'miniapp',order_id,null))when grouping(brand_id) = 0then count(if(brand_id is not null and rk2=1  and order_from = 'miniapp',order_id,null))when grouping(min_class_id) = 0then count(if(min_class_id is not null and rk2=1  and order_from = 'miniapp',order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1  and order_from = 'miniapp',order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1  and order_from = 'miniapp',order_id,null))   -- 聚合定制when grouping(max_class_id) = 0then count(if(max_class_id is not null and rk2=1  and order_from = 'miniapp',order_id,null))elsecount(if(dt is not null  and rk2=1 and order_from = 'miniapp',order_id,null))  -- 日期end as miniapp_order_cnt,-- android成交单量case when grouping(store_id) = 0then count(if(store_id is not null and rk2=1 and order_from = 'android',order_id,null))when grouping(trade_area_id) = 0then count(if(trade_area_id is not  null and rk2=1 and order_from = 'android',order_id,null))when  grouping(city_id) = 0then count(if(city_id is not null and rk2=1 and order_from = 'android',order_id,null))when grouping(brand_id) = 0then count(if(brand_id is not null and rk2=1 and order_from = 'android',order_id,null))when grouping(min_class_id) = 0then count(if(min_class_id is not null and rk2=1  and order_from = 'android',order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1 and order_from = 'android',order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1 and order_from = 'android',order_id,null))   -- 聚合定制when grouping(max_class_id) = 0then count(if(max_class_id is not null and rk2=1  and order_from = 'android',order_id,null))elsecount(if(dt is not null and rk2=1 and order_from = 'android',order_id,null))  -- 日期end as android_order_cnt,-- ios成交单量case when grouping(store_id) = 0then count(if(store_id is not null and rk2=1  and order_from = 'ios',order_id,null))when grouping(trade_area_id) = 0then count(if(trade_area_id is not null and rk2=1  and order_from = 'ios',order_id,null))when  grouping(city_id) = 0then count(if(city_id is not null and rk2=1   and order_from = 'ios',order_id,null))when grouping(brand_id) = 0then count(if(brand_id is not null and rk2=1   and order_from = 'ios',order_id,null))when grouping(min_class_id) = 0then count(if(min_class_id is not null and rk2=1   and order_from = 'ios',order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1   and order_from = 'ios',order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1   and order_from = 'ios',order_id,null))   -- 聚合定制when grouping(max_class_id) = 0then count(if(max_class_id is not null and rk2=1   and order_from = 'ios',order_id,null))elsecount(if(dt is not null and rk2=1   and order_from = 'ios',order_id,null))  -- 日期end as ios_order_cnt,-- pcweb成交单量case when grouping(store_id) = 0then count(if(store_id is not null and rk2=1  and order_from = 'pcweb',order_id,null))when grouping(trade_area_id) = 0then count(if(trade_area_id is not null and rk2=1  and order_from = 'pcweb',order_id,null))when  grouping(city_id) = 0then count(if(city_id is not null and rk2=1   and order_from = 'pcweb',order_id,null))when grouping(brand_id) = 0then count(if(brand_id is not null and rk2=1   and order_from = 'pcweb',order_id,null))when grouping(min_class_id) = 0then count(if(min_class_id is not null and rk2=1   and order_from = 'pcweb',order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1   and order_from = 'pcweb',order_id,null))when grouping(mid_class_id) = 0then count(if(mid_class_id is not null and rk2=1   and order_from = 'pcweb',order_id,null))   -- 聚合定制when grouping(max_class_id) = 0then count(if(max_class_id is not null and rk2=1   and order_from = 'pcweb',order_id,null))elsecount(if(dt is not null  and order_from = 'pcweb',order_id,null))  -- 日期end as pcweb_order_cnt,dt
from t0
where rk1 = 1
group by
grouping sets (dt,(dt,city_id,city_name),(dt,city_id,city_name,trade_area_id,trade_area_name),(dt,city_id,city_name,trade_area_id,trade_area_name,store_id,store_name),(dt,brand_id,brand_name),(dt,max_class_id,max_class_name),(dt,max_class_id,max_class_name,mid_class_id,mid_class_name),(dt,max_class_id,max_class_name,mid_class_id,mid_class_name,min_class_id,min_class_name)
);

这里主要是运用了grouping和grouping sets的语法,如果不了解可以百度一下。


总结

这里介绍了HIve数仓新零售项目DWS层的构建(Grouping sets)模型,Grouping sets模型适合于多维度,多指标的稀疏宽表的构建,可以把不同的维度放在同一张宽表中,方便以后查询。同时在建立聚合字段的时候,可以根据每个维度进行定制聚合的操作。比较灵活。
如果对grouping和grouping sets的语法有疑问,可以留言讨论。

这篇关于HIve数仓新零售项目DWS层的构建(Grouping sets)模型的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/332062

相关文章

一文详解如何在idea中快速搭建一个Spring Boot项目

《一文详解如何在idea中快速搭建一个SpringBoot项目》IntelliJIDEA作为Java开发者的‌首选IDE‌,深度集成SpringBoot支持,可一键生成项目骨架、智能配置依赖,这篇文... 目录前言1、创建项目名称2、勾选需要的依赖3、在setting中检查maven4、编写数据源5、开启热

SpringBoot项目配置logback-spring.xml屏蔽特定路径的日志

《SpringBoot项目配置logback-spring.xml屏蔽特定路径的日志》在SpringBoot项目中,使用logback-spring.xml配置屏蔽特定路径的日志有两种常用方式,文中的... 目录方案一:基础配置(直接关闭目标路径日志)方案二:结合 Spring Profile 按环境屏蔽关

基于Python构建一个高效词汇表

《基于Python构建一个高效词汇表》在自然语言处理(NLP)领域,构建高效的词汇表是文本预处理的关键步骤,本文将解析一个使用Python实现的n-gram词频统计工具,感兴趣的可以了解下... 目录一、项目背景与目标1.1 技术需求1.2 核心技术栈二、核心代码解析2.1 数据处理函数2.2 数据处理流程

MySQL版本问题导致项目无法启动问题的解决方案

《MySQL版本问题导致项目无法启动问题的解决方案》本文记录了一次因MySQL版本不一致导致项目启动失败的经历,详细解析了连接错误的原因,并提供了两种解决方案:调整连接字符串禁用SSL或统一MySQL... 目录本地项目启动报错报错原因:解决方案第一个:第二种:容器启动mysql的坑两种修改时区的方法:本地

Python FastMCP构建MCP服务端与客户端的详细步骤

《PythonFastMCP构建MCP服务端与客户端的详细步骤》MCP(Multi-ClientProtocol)是一种用于构建可扩展服务的通信协议框架,本文将使用FastMCP搭建一个支持St... 目录简介环境准备服务端实现(server.py)客户端实现(client.py)运行效果扩展方向常见问题结

详解如何使用Python构建从数据到文档的自动化工作流

《详解如何使用Python构建从数据到文档的自动化工作流》这篇文章将通过真实工作场景拆解,为大家展示如何用Python构建自动化工作流,让工具代替人力完成这些数字苦力活,感兴趣的小伙伴可以跟随小编一起... 目录一、Excel处理:从数据搬运工到智能分析师二、PDF处理:文档工厂的智能生产线三、邮件自动化:

springboot项目中使用JOSN解析库的方法

《springboot项目中使用JOSN解析库的方法》JSON,全程是JavaScriptObjectNotation,是一种轻量级的数据交换格式,本文给大家介绍springboot项目中使用JOSN... 目录一、jsON解析简介二、Spring Boot项目中使用JSON解析1、pom.XML文件引入依

使用vscode搭建pywebview集成vue项目实践

《使用vscode搭建pywebview集成vue项目实践》:本文主要介绍使用vscode搭建pywebview集成vue项目实践,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地... 目录环境准备项目源码下载项目说明调试与生成可执行文件核心代码说明总结本节我们使用pythonpywebv

详解如何使用Python从零开始构建文本统计模型

《详解如何使用Python从零开始构建文本统计模型》在自然语言处理领域,词汇表构建是文本预处理的关键环节,本文通过Python代码实践,演示如何从原始文本中提取多尺度特征,并通过动态调整机制构建更精确... 目录一、项目背景与核心思想二、核心代码解析1. 数据加载与预处理2. 多尺度字符统计3. 统计结果可

Maven项目中集成数据库文档生成工具的操作步骤

《Maven项目中集成数据库文档生成工具的操作步骤》在Maven项目中,可以通过集成数据库文档生成工具来自动生成数据库文档,本文为大家整理了使用screw-maven-plugin(推荐)的完... 目录1. 添加插件配置到 pom.XML2. 配置数据库信息3. 执行生成命令4. 高级配置选项5. 注意事