Hive作业

来源：互联网发布：东德知乎编辑：程序博客网时间：2024/06/11 14:27

最近学习Hive。实现了其中几个作业，如下。Hql语句没怎么调整格式，见谅。

Hive练习二中的题目

(1). 2017 年4 月1 日各个商品品牌的交易笔数，按照销售交易从多到少排序

1 select 2 brand,3 count(*) as totalCount 4 from 5 record 6 join brand_dimension on record.bid = brand_dimension.bid 7 where record.trancation_date= '2017-04-01' 8 group by brand_dimension.brand9 order by totalCount desc;

输出

+------------+-------------+--+|   brand    | totalcount  |+------------+-------------+--+| SAMSUNG    | 2           || WULIANGYE  | 1           || PUMA       | 1           || OPPO       | 1           || DELL       | 1           |+------------+-------------+--+

(2). 不同性别消费的商品类别情况(不同性别消费不同商品类别的总价)

 1 select  2 gender, 3 category,  4 sum(price) as totalPrice 5 from record  6 join user_dimension on record.uid = user_dimension.uid 7 join brand_dimension on record.bid = brand_dimension.bid 8 group by gender,  9 category10 order by 11 gender,12 category,13 totalPrice;

输出

+---------+------------+-------------+--+| gender  |  category  | totalprice  |+---------+------------+-------------+--+| M       | computer   | 252         || M       | food       | 429         || M       | sports     | 120         || M       | telephone  | 1669        |+---------+------------+-------------+--+

Hive练习三中的题目

(1). 谁不是经理

1 select 2 name 3 from 4 employees 5 where 6 size(subordinates)<=0;

输出

+-------------------+---------------+--+|       name        | subordinates  |+-------------------+---------------+--+| Todd Jones        | []            || Bill King         | []            || Stacy Accountant  | []            |+-------------------+---------------+--+3 rows selected (0.092 seconds)

(2). 谁住在邮编比60500 大的地区

1 select name,address.zip from employees where address.zip> 60500;

输出

+-------------------+--------+--+|       name        |  zip   |+-------------------+--------+--+| John Doe          | 60600  || Mary Smith        | 60601  || Todd Jones        | 60700  || Stacy Accountant  | 60563  |+-------------------+--------+--+4 rows selected (0.123 seconds)

(3). 联邦税超过0.15 的雇员

1 select 2 name, 3 deductions['Federal Taxes'] 4 from 5 employees 6 where deductions['Federal Taxes'] > 0.15+1e-5;

输出

+---------------+---------------+--+|     name      | federaltaxes  |+---------------+---------------+--+| John Doe      | 0.2           || Mary Smith    | 0.2           || Boss Man      | 0.3           || Fred Finance  | 0.3           |+---------------+---------------+--+4 rows selected (0.126 seconds)

(4). 谁住在Drive 或Par 街道

1 select 2 name,3 address 4 from 5 employees 6 where address.street rlike '^.*(Drive|Par).*$';

输出

---------------+------------------------------------------------------------------------------+--+|     name      |                                   address                                    |+---------------+------------------------------------------------------------------------------+--+| Boss Man      | {"street":"1 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500}  || Fred Finance  | {"street":"2 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500}  |+---------------+------------------------------------------------------------------------------+--+2 rows selected (0.218 seconds)

Hive练习十中的题目

1. 建表

创建并使用 database tmall

1 create database tmall;2 use tmall;

建表 product，格式 text

 1 create table if not exists product( 2 item_id string, 3 pic_url string, 4 category string, 5 brand_id string, 6 seller_id string 7 ) 8 row format delimited 9 fields terminated by ','10 lines terminated by '\n'11 stored as textfile;

建表 review，格式 text

 1 create table if not exists review( 2 item_id string, 3 user_id string, 4 feedback int, 5 feedback_time timestamp, 6 feedback_pic_url string 7 ) 8 row format delimited 9 fields terminated by ','10 lines terminated by '\n'11 stored as textfile;

建表 log_orc，格式 orc

 1 # log as textfile 2 create table if not exists log( 3 item_id string, 4 user_id string, 5 action string, 6 action_time timestamp 7 ) 8 row format delimited 9 fields terminated by ','10 lines terminated by '\n'11 stored as textfile;12 13 # log as orc14 create table if not exists log_orc15 like log16 stored as orc;

2. 载入数据

 1 #product 2 load data local inpath '/home/zkpk/test/tmall/tmall_product.csv' overwrite into table product; 3 #验证：select * from product limit 100; 4 #review 5 load data local inpath '/home/zkpk/test/tmall/tmall_review.csv' overwrite into table review; 6 #验证：select * from review limit 100; 7 #log 8 load data local inpath '/home/zkpk/test/tmall/tmall_log.csv' overwrite into table log; 9 #验证：select * from log limit 100;10 insert into table log_orc select * from log;

3. 载入数据

热度前十名的商家（商品被浏览的最多）

1 select 2 p.seller_id, 3 count(*) as click_count 4 from log_orc l 5 inner join product p on l.item_id = p.item_id 6 where l.action='click' 7 group by p.seller_id 8 order by click_count desc 9 limit 10;

输出

+--------------+--------------+--+| p.seller_id | click_count |+--------------+--------------+--+| s403 | 143 || s190 | 116 || s284 | 86 || s161 | 78 || s227 | 59 || s61 | 59 || s29 | 57 || s82 | 45 || s464 | 42 || s261 | 42 |+--------------+--------------+--+

好评率(feedback=5为好评)低于60%的商品

1 select 2 item_id, 3 count(case when feedback =5 then 1 else null end) as best, 4 count(1) as total, 5 count(case when feedback =5 then 1 else null end)/ count(1) as best_rate 6 from review 7 group by item_id 8 having count(case when feedback =5 then 1 else null end) < count(1) *0.6 ;

输出

+----------+-------+--------+----------------------+--+| item_id  | best  | total  |      best_rate       |+----------+-------+--------+----------------------+--+| 221      | 0     | 1      | 0.0                  || 256      | 0     | 4      | 0.0                  || 287      | 1     | 3      | 0.3333333333333333   || 288      | 2     | 6      | 0.3333333333333333   || 378      | 0     | 1      | 0.0                  || 379      | 1     | 6      | 0.16666666666666666  || 397      | 0     | 2      | 0.0                  || 398      | 0     | 1      | 0.0                  || 423      | 2     | 7      | 0.2857142857142857   || 449      | 1     | 5      | 0.2                  || 45       | 0     | 6      | 0.0                  || 450      | 0     | 1      | 0.0                  || 451      | 0     | 3      | 0.0                  || 453      | 1     | 2      | 0.5                  || 505      | 1     | 2      | 0.5                  |+----------+-------+--------+----------------------+--+

找出潜在购买用户（收藏了商品，但是没有购买）

 1 select distinct  2 m.item_id,  3 m.user_id  4 from log_orc m  5 left outer join log_orc s  6 on (m.item_id = s.item_id  7 and m.user_id = s.user_id  8 and s.action='alipay')  9 where m.action='collect' 10 and s.item_id is null;

输出

+------------+------------+--+| m.item_id  | m.user_id  |+------------+------------+--+| 152        | 3286       || 24         | 3389       || 242        | 39         || 422        | 3423       || 468        | 2727       |+------------+------------+--+

阅读全文

0 0