5 大数据实战-hive实战分析
来源:互联网 发布:伍迪艾伦养女知乎 编辑:程序博客网 时间:2024/04/28 23:48
1 内部表
Show databses;Use hive_data;
- 1.1 创建内部表
CREATE TABLE SOGOUQ2(DT STRING,WEBSESSION STRING,WORD STRING,S_SEQ INT,C_SEQ INT,WEBSITE STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' ;
1.2 加载数据
Load local data:
LOAD DATA LOCAL INPATH '/data/software/sougou/SogouQ2.txt' INTO TABLE SOGOUQ2;Load hdfs data:LOAD DATA INPATH 'hdfs://shulaibao2:9010/home/hadoop/upload/test/sougou/SogouQ1.txt' INTO TABLE SOGOUQ2;
- 1.3 查看hdfs数据
hadoop fs -ls /user/hive/warehouse/hive_data.db
- 1.4 操作数据库
select count(*) from SOGOUQ2;select count(*) from SOGOUQ2 where WEBSITE like '%baidu%';
2 外部表
- 2.1 创建hdfs数据存储目录
hadoop fs -mkdir -p /home/hadoop/upload/hive/sogouq1hdfs dfs -ls /home/hadoop/upload/hive/sogouq1
- 2.2 创建表
CREATE EXTERNAL TABLE SOGOUQ1(DT STRING,WEBSESSION STRING,WORD STRING,S_SEQ INT,C_SEQ INT,WEBSITE STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/home/hadoop/upload/hive/sogouq1';Show tables;
- 2.3 加载数据
Hadoop fs -copy /home/hadoop/upload/test/sougou/SogouQ1.txt /home/hadoop/upload/hive/sogouq1hdfs关联:copy、mv本地系统数据关联:copyFromLocal
- 2.4 操作数据库
select count(*) from SOGOUQ1;
总结:【注】在删除表的时候,内部表将删除表的元数据和数据文件;而删除外部表的时候,仅仅删除外部表的元数据,不删除数据文件
3 交易数据统计实战
- 3.1 数据准备
*数据下载:
https://pan.baidu.com/s/1o7HpDEy#list/path=%2F58.%E5%8D%9A%E5%AE%A2%E8%B5%84%E6%96%99%2F20150901Spark%E5%85%A5%E9%97%A8%E5%AE%9E%E6%88%98%E7%B3%BB%E5%88%97%2Fdata%2Fclass5%2Fsaledata&parentPath=%2F58.%E5%8D%9A%E5%AE%A2%E8%B5%84%E6%96%99%2F20150901Spark%E5%85%A5%E9%97%A8%E5%AE%9E%E6%88%98%E7%B3%BB%E5%88%97*
tbDate:日期、年月、年、月、日、周几、第几周、季度、旬、半月;tbStock:订单号、交易位置、交易日期;tbStockDetail:订单号、行号、货品、数量、金额:
- 3.2 创建表
CREATE TABLE tbDate(dateID string,theyearmonth string,theyear string,themonth string,thedate string,theweek string,theweeks string,thequot string,thetenday string,thehalfmonth string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ;CREATE TABLE tbStock(ordernumber STRING,locationid string,dateID string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ;CREATE TABLE tbStockDetail(ordernumber STRING,rownum int,itemid string,qty int,price int ,amount int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ;
- 3.3 加载数据
LOAD DATA LOCAL INPATH '/data/software/testdata/tbDate.txt' INTO TABLE tbDate;LOAD DATA LOCAL INPATH '/data/software/testdata/tbStock.txt' INTO TABLE tbStock;LOAD DATA LOCAL INPATH '/data/software/testdata/tbStockDetail.txt' INTO TABLE tbStockDetail;
3.4 数据统计分析
3.4.1按年统计销售额,年份升序
select c.theyear, sum(b.amount) from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumberleft join tbDate c on a.dateid=c.dateid group by c.theyear order by c.theyear;
- 3.4.2按交易日期-订单号分组统计销售额
select a.dateid,a.ordernumber,sum(b.amount) as sumofamount from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumber group by a.dateid,a.ordernumber;
- 3.4.3统计年度销售额最大的交易日期-订单号
select c.theyear,max(d.sumofamount) from tbDate c inner join (select a.dateid,a.ordernumber,sum(b.amount) as sumofamount from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumber group by a.dateid,a.ordernumber) d on c.dateid=d.dateid group by c.theyear sort by c.theyear;
- 3.4.4统计季度销售额前10位
select c.theyear,c.thequot,sum(b.amount) as sumofamount from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumberleft join tbDate c on a.dateid=c.dateid group by c.theyear,c.thequot order by sumofamount desc limit 10;
- 3.4.5销售金额在100000以上的单据
select a.ordernumber,sum(b.amount) as sumofamount from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumber group by a.ordernumber having sumofamount>100000;
- 3.4.6按交易日统销售额
select c.theyear,b.itemid,sum(b.amount) as sumofamount from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumberleft join tbDate c on a.dateid=c.dateid group by c.theyear,b.itemid;
- 3.4.7统计每个年度销售额最大的交易日
select d.theyear,max(d.sumofamount) as maxofamount from (select c.theyear,b.itemid,sum(b.amount) as sumofamount from tbStock a left join tbStockDetail b on a.ordernumber=b.ordernumberleft join tbDate c on a.dateid=c.dateid group by c.theyear,b.itemid;) d group by d.theyear ;
- 3.4.8统计年度最畅销的商品
select distinct e.theyear,e.itemid,f.maxofamount from (select c.theyear,b.itemid,sum(b.amount) as sumofamount from tbStock a,tbStockDetail b,tbDate c where a.ordernumber=b.ordernumber and a.dateid=c.dateid group by c.theyear,b.itemid) e , (select d.theyear,max(d.sumofamount) as maxofamount from (select c.theyear,b.itemid,sum(b.amount) as sumofamount from tbStock a,tbStockDetail b,tbDate c where a.ordernumber=b.ordernumber and a.dateid=c.dateid group by c.theyear,b.itemid) d group by d.theyear) f where e.theyear=f.theyear and e.sumofamount=f.maxofamount order by e.theyear;
阅读全文
0 0
- 5 大数据实战-hive实战分析
- 大数据时代--Hive实战之Hive命令
- 大数据分析挖掘方法实战
- 大数据实战:用户流量分析系统
- 大数据实战:用户流量分析系统
- hive数据分析实战及执行效率执行策略分析
- 4大数据实战系列-hive安装配置优化
- Hive实战之日志分析
- Hive实战之日志分析
- Hive数据挖掘实战教程
- 《实战大数据》
- 大数据实战
- 实战大数据读书笔记
- 大数据实时处理实战
- 大数据实时处理实战
- 【hive实战】使用hive分析 hadoop 日志
- 【hive实战】使用hive分析 hadoop 日志
- 航班数据分析实战
- bzoj 1878: [SDOI2009]HH的项链(莫队算法)
- maven项目启动报错dbType:not support null,url null
- 求“1+3+5+7+…+99”的值。
- python tornado下logging.handlers.HTTPHandler多打印一个None的解决方案
- Jenkins学习(更新中)
- 5 大数据实战-hive实战分析
- Java后台框架篇--Spring配置数据源的三种方式(dbcp,c3p0,jndi)
- PHP用了yii2实现的网站
- 微积分
- Linux 进程学习(四)------ sigaction 函数
- 15算法课程 35. Search Insert Position
- spring boot项目实战:异常处理
- android蓝牙4.0BLE及2.0 2.1 apk 串口助手带16个自定义按键和自定义指令 字符接收 十六进制或字符发送
- C语言指针详解