Hive作业
来源:互联网 发布:东德知乎 编辑:程序博客网 时间:2024/06/11 14:27
最近学习Hive。实现了其中几个作业,如下。Hql语句没怎么调整格式,见谅。
Hive练习二中的题目
(1). 2017 年4 月1 日各个商品品牌的交易笔数,按照销售交易从多到少排序
1 select 2 brand,3 count(*) as totalCount 4 from 5 record 6 join brand_dimension on record.bid = brand_dimension.bid 7 where record.trancation_date= '2017-04-01' 8 group by brand_dimension.brand9 order by totalCount desc;
输出
+------------+-------------+--+| brand | totalcount |+------------+-------------+--+| SAMSUNG | 2 || WULIANGYE | 1 || PUMA | 1 || OPPO | 1 || DELL | 1 |+------------+-------------+--+
(2). 不同性别消费的商品类别情况(不同性别消费不同商品类别的总价)
1 select 2 gender, 3 category, 4 sum(price) as totalPrice 5 from record 6 join user_dimension on record.uid = user_dimension.uid 7 join brand_dimension on record.bid = brand_dimension.bid 8 group by gender, 9 category10 order by 11 gender,12 category,13 totalPrice;
输出
+---------+------------+-------------+--+| gender | category | totalprice |+---------+------------+-------------+--+| M | computer | 252 || M | food | 429 || M | sports | 120 || M | telephone | 1669 |+---------+------------+-------------+--+
Hive练习三中的题目
(1). 谁不是经理
1 select 2 name 3 from 4 employees 5 where 6 size(subordinates)<=0;
输出
+-------------------+---------------+--+| name | subordinates |+-------------------+---------------+--+| Todd Jones | [] || Bill King | [] || Stacy Accountant | [] |+-------------------+---------------+--+3 rows selected (0.092 seconds)
(2). 谁住在邮编比60500 大的地区
1 select name,address.zip from employees where address.zip> 60500;
输出
+-------------------+--------+--+| name | zip |+-------------------+--------+--+| John Doe | 60600 || Mary Smith | 60601 || Todd Jones | 60700 || Stacy Accountant | 60563 |+-------------------+--------+--+4 rows selected (0.123 seconds)
(3). 联邦税超过0.15 的雇员
1 select 2 name, 3 deductions['Federal Taxes'] 4 from 5 employees 6 where deductions['Federal Taxes'] > 0.15+1e-5;
输出
+---------------+---------------+--+| name | federaltaxes |+---------------+---------------+--+| John Doe | 0.2 || Mary Smith | 0.2 || Boss Man | 0.3 || Fred Finance | 0.3 |+---------------+---------------+--+4 rows selected (0.126 seconds)
(4). 谁住在Drive 或Par 街道
1 select 2 name,3 address 4 from 5 employees 6 where address.street rlike '^.*(Drive|Par).*$';
输出
---------------+------------------------------------------------------------------------------+--+| name | address |+---------------+------------------------------------------------------------------------------+--+| Boss Man | {"street":"1 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500} || Fred Finance | {"street":"2 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500} |+---------------+------------------------------------------------------------------------------+--+2 rows selected (0.218 seconds)
Hive练习十中的题目
1. 建表
创建并 使用 database tmall
1 create database tmall;2 use tmall;
建表 product,格式 text
1 create table if not exists product( 2 item_id string, 3 pic_url string, 4 category string, 5 brand_id string, 6 seller_id string 7 ) 8 row format delimited 9 fields terminated by ','10 lines terminated by '\n'11 stored as textfile;
建表 review,格式 text
1 create table if not exists review( 2 item_id string, 3 user_id string, 4 feedback int, 5 feedback_time timestamp, 6 feedback_pic_url string 7 ) 8 row format delimited 9 fields terminated by ','10 lines terminated by '\n'11 stored as textfile;
建表 log_orc,格式 orc
1 # log as textfile 2 create table if not exists log( 3 item_id string, 4 user_id string, 5 action string, 6 action_time timestamp 7 ) 8 row format delimited 9 fields terminated by ','10 lines terminated by '\n'11 stored as textfile;12 13 # log as orc14 create table if not exists log_orc15 like log16 stored as orc;
2. 载入数据
1 #product 2 load data local inpath '/home/zkpk/test/tmall/tmall_product.csv' overwrite into table product; 3 #验证:select * from product limit 100; 4 #review 5 load data local inpath '/home/zkpk/test/tmall/tmall_review.csv' overwrite into table review; 6 #验证:select * from review limit 100; 7 #log 8 load data local inpath '/home/zkpk/test/tmall/tmall_log.csv' overwrite into table log; 9 #验证:select * from log limit 100;10 insert into table log_orc select * from log;
3. 载入数据
- 热度前十名的商家(商品被浏览的最多)
1 select 2 p.seller_id, 3 count(*) as click_count 4 from log_orc l 5 inner join product p on l.item_id = p.item_id 6 where l.action='click' 7 group by p.seller_id 8 order by click_count desc 9 limit 10;
输出
+--------------+--------------+--+| p.seller_id | click_count |+--------------+--------------+--+| s403 | 143 || s190 | 116 || s284 | 86 || s161 | 78 || s227 | 59 || s61 | 59 || s29 | 57 || s82 | 45 || s464 | 42 || s261 | 42 |+--------------+--------------+--+
- 好评率(feedback=5为好评)低于60%的商品
1 select 2 item_id, 3 count(case when feedback =5 then 1 else null end) as best, 4 count(1) as total, 5 count(case when feedback =5 then 1 else null end)/ count(1) as best_rate 6 from review 7 group by item_id 8 having count(case when feedback =5 then 1 else null end) < count(1) *0.6 ;
输出
+----------+-------+--------+----------------------+--+| item_id | best | total | best_rate |+----------+-------+--------+----------------------+--+| 221 | 0 | 1 | 0.0 || 256 | 0 | 4 | 0.0 || 287 | 1 | 3 | 0.3333333333333333 || 288 | 2 | 6 | 0.3333333333333333 || 378 | 0 | 1 | 0.0 || 379 | 1 | 6 | 0.16666666666666666 || 397 | 0 | 2 | 0.0 || 398 | 0 | 1 | 0.0 || 423 | 2 | 7 | 0.2857142857142857 || 449 | 1 | 5 | 0.2 || 45 | 0 | 6 | 0.0 || 450 | 0 | 1 | 0.0 || 451 | 0 | 3 | 0.0 || 453 | 1 | 2 | 0.5 || 505 | 1 | 2 | 0.5 |+----------+-------+--------+----------------------+--+
- 找出潜在购买用户(收藏了商品,但是没有购买)
1 select distinct 2 m.item_id, 3 m.user_id 4 from log_orc m 5 left outer join log_orc s 6 on (m.item_id = s.item_id 7 and m.user_id = s.user_id 8 and s.action='alipay') 9 where m.action='collect' 10 and s.item_id is null;
输出
+------------+------------+--+| m.item_id | m.user_id |+------------+------------+--+| 152 | 3286 || 24 | 3389 || 242 | 39 || 422 | 3423 || 468 | 2727 |+------------+------------+--+
阅读全文
0 0
- Hive作业
- hive 作业优化总结
- Hive作业优化
- hive体系结构和hive作业形式
- HIVE作业管理解决方案分析
- Hive作业优化总结(来自一号店)
- Hive作业优化总结(来自一号店)
- Hive 优化-限制大作业的提交
- 【练习作业】HBase与Hive操作
- Hive RCFile合并作业产生重复数据问题
- Hadoop运维:hive作业跑挂的原因总结
- hive中如何确定一个mapreduce作业的reduce数量
- 作业电影评分系统 HIVE实战 正则表达式(限于string)解决了HIVE 源文件多个分隔符的问题
- hive中设置查询不启动mapreduce作业(默认不启动)hive-site.xml关键配置
- Hive
- HIVE
- Hive
- hive
- Hadoop生态圈各组件的启动及关闭脚本
- 苏宁校招把这所学校师生惹毛了!
- 微博CEO自曝三大新功能!网友:充会员吗?
- 使用 MapReduce 实现分组排名
- 想和你们说个事:
- Hive作业
- Java EE核心模式学习理解和记录
- iOS 动画基础总结篇
- Android常用的依赖以及权限
- git文件夹下项目更改ip地址小结
- 【makefile】一起写makefile(六)--函数的使用
- ssh离线查询错误:Error accessing field [private java.lang.String 包名.BaseDict.dictId] by reflection
- 解决nestedScrollview 嵌套 recyclerview出现的异常
- TortoiseSVN客户端重新设置用户名和密码