Hive实战之Youtube数据集

来源:互联网 发布:淘宝浏览量有什么用 编辑:程序博客网 时间:2024/04/30 09:01

大家可以看SparkSQL实战之Youtube数据集来进行比较

1 数据来源

本次实战的数据来自于”YouTube视频统计与社交网络”的数据集,是西蒙弗雷泽大学计算机学院在2008年所爬取的数据
数据集地址

1. 1 Youtube视频表格式如下:

列名 注释 视频ID 一个11位字符串,是唯一的 上传 一个字符串的视频上传者的用户名 年龄 视频上传日期和2007年2月15日之间的整数天(YouTube的设立) 类别 由上传者选择的视频类别的字符串 长度 视频长度的整数v 观看数 一整数的视图 率 一个浮点数的视频速率 评分 整数的评分 评论数 一整数的评论 相关视频ID 最多20个字符串的相关视频ID

数据之间采用”\t”作为分隔符

具体数据如下:

video ID uploader age category length views rate ratings comments related IDs ifnlnji-Y4s Hooran 1162 Travel & Events 239 189 4.8 10 3 tpAL3I0urI4 … ifnlnji-Y4s

数据量大小为1G,条数为500万+

1.2 用户表

列名 uploader videos friends 类型 string int int 解释 上传者 上传视频数 朋友数

2 实战演练准备

2.1 环境搭建

使用环境为
hive-1.1.0-cdh5.4.5
hadoop-2.6.0-cdh5.4.5

演示形式为使用hive shell

2.2 数据清洗

我们一起来看看数据

video ID uploader age category length views rate ratings comments related IDs ifnlnji-Y4s Hooran 1162 Travel & Events 239 189 4.8 10 3 tpAL3I0urI4 … ifnlnji-Y4s

主要的问题在于category和relatedIDs处理,由于Hive是支持array格式的,所以我们想到的是使用array来存储category和relatedIDs,但是我们发现category的分割符是”&”而realatedIDs的分隔符是”\t”,我们在创建表格的时候能够指定array的分隔符,但是只能指定一个,所以再将数据导入到Hive表格之前我们需要对数据进行一定转换和清洗

并且数据中肯定会存在一些不完整数据和一些奇怪的格式,所以数据的清洗是必要的,我在这里所使用的数据清洗方式是使用Spark进行清洗,也可以使用自定义UDF函数来进行清洗

数据清洗注意点
1)我们可以看到每行数据以”\t”作为分隔符,每行有十列数据,最后一列关联ID可以为空,那么我们对数据进行split之后数组的大小要大于8
2)数据中存在"uNiKXDA8eyQ KRQE 1035 News & Politics 107"这样格式的数据,所以在处理category时需要注意 News & Politics中间的&

处理后的数据如下:

video ID uploader age category length views rate ratings comments related IDs PkGUU_ggO3k theresident 704 Entertainment 262 11235 3.85 247 280 PkGUU_ggO3k&EYC5bWF0ss8&…shU2hfHKmU0&p0lq5-8IDqY RX24KLBhwMI lemonette 697 People&Blogs 512 24149 4.22 315 474 t60tW0WevkE&WZgoejVDZlo&…s8xf4QX1UvA&2cKd9ERh5-8

下面的实战都是基于数据清洗后的数据进行的

2.3 创建表格和数据导入

2.3.1 youtube表格的创建和导入

1)youtube1的创建,文件格式为textfile
create table youtube1(videoId string, uploader string, age int, category array, length int, views int, rate float, ratings int, comments int,relatedId array)
row format delimited
fields terminated by “\t”
collection items terminated by “&”
stored as textfile;

2)youtube2的创建,文件格式为orc
create table youtube2(videoId string, uploader string, age int, category array, length int, views int, rate float, ratings int, comments int,relatedId array)
row format delimited
fields terminated by “\t”
collection items terminated by “&”
stored as orc;

3)youtube3的创建,文件格式为orc,进行桶分区
create table youtube3(videoId string, uploader string, age int, category array, length int, views int, rate float, ratings int, comments int,relatedId array)
clustered by (uploader) into 8 buckets
row format delimited
fields terminated by “\t”
collection items terminated by “&”
stored as orc;

数据导入:
1)load data inpath “path” into table youtube1;
2)由于无法将textfile格式的数据导入到orc格式的表格,所以数据需要从youtube1导入到youtube2和youtube3:
insert into table youtube2 select * from youtube1;
insert into table youtube3 select * from youtube1;

2.3.2 user表格的创建和导入

1)user_tmp的创建,文件格式textfile,24buckets
create table user_tmp(uploader string,videos int,friends int)
clustered by (uploader) into 24 buckets
row format delimited
fields terminated by “\t”
stored as textfile;

2)user的创建,文件格式orc,24buckets
create table user(uploader string,videos int,friends int)
clustered by (uploader) into 24 buckets
row format delimited
fields terminated by “\t”
stored as orc;

user表的数据导入也是同理
数据导入:
1)load data inpath “path” into table user_tmp;
2)由于无法将textfile格式的数据导入到orc格式的表格,所以数据需要从user_tmp导入到user:
insert into table user select * from user_tmp;

3 实战需求

1)统计出观看数最多的10个视频
2)统计出视频类别热度的前10个类型
3)统计出视频观看数最高的50个视频的所属类别
4)统计出观看数最多的前N个视频所关联的视频的所属类别排行
5)筛选出每个类别中热度最高的前10个视频
6)筛选出每个类别中评分最高的前10个视频
7)找出用户中上传视频最多的10个用户的所有视频
8)筛选出每个类别中观看数Top10

4 实战演练

4.1 统计出观看数最多的10个视频

select * from youtube3 order by views desc limit 10;

结果如下:

hive> select * from youtube3 order by views desc limit 10;Query ID = hadoop_20170710155353_4bc057f0-bbd5-4bfe-a66c-ec0e17cb3ca9Total jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1Starting Job = job_1499153664137_0101, Tracking URL = http://master:8088/proxy/application_1499153664137_0101/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0101Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 12017-07-10 15:53:55,297 Stage-1 map = 0%,  reduce = 0%...2017-07-10 15:56:06,210 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 106.93 secMapReduce Total cumulative CPU time: 1 minutes 46 seconds 930 msecEnded Job = job_1499153664137_0101MapReduce Jobs Launched: Stage-Stage-1: Map: 4  Reduce: 1   Cumulative CPU: 106.93 sec   HDFS Read: 574632526 HDFS Write: 2916 SUCCESSTotal MapReduce CPU Time Spent: 1 minutes 46 seconds 930 msecOKdMH0bHeiRNg judsonlaipply   415 ["Comedy"]  360 79897120    4.65    287260  131356  ["HSoVKUVOnfQ","pv5zWaTEVkI","v1U8f7TmYkA","ZCT3MIUTc3o","mDiBOF8XI44","P9LmHXXWiJs","fo_QVq2lGMs","EtGQgSY9Nn4","5P6UU6m3cqk","61heClTFc5w","h3gdSHGcUU4","OPmYbP0F4Zw","gsOaQGF7kiQ","H2gw9VE16mo","rGkIUlYEQT8","innfyQZHPpo","cu8tUy14zQo","cQ25-glGRzI","WqDbn2iLwwY","jvz0bvYmnto"]cQ25-glGRzI RCARecords  742 ["Music"]   227 77674728    4.45    188154  186858  ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]12Z3J1uzd0Q kaejane 404 ["Film","Animation"]    615 65341925    3.03    9189    5508    ["innfyQZHPpo","-_CSo1gOd48","1Al4crLW9EM","nhSZs-aAZbo","Af9aPlG2WFw","5P6UU6m3cqk","uBHsOTYO6AI","lW19DnWz6vg","vr3x_RRJdd4","5GE82tqcYYQ","D2kJZOfq7zk","lsO6D1rwrKc","dMH0bHeiRNg","kNLXjXxj3J8","vQbYvjmfbr4","G-HonZGBWus","MuOvqeABHvQ","-2caf6KlSyw","cQ25-glGRzI","pv5zWaTEVkI"]LpAI8TzQDes IMVUinc 848 ["Entertainment"]   68  65078772    1.38    5678    2499    ["c2eP-UADAi8","5P6UU6m3cqk","cQ25-glGRzI","FYbqcgd97NQ","12Z3J1uzd0Q","VS4wFOWFUt8","RB-wUgnyGv0","sBQLq2VmZcA","w2xUzv6iZWo","oOF3T9Zvizc","244qR7SvvX0","pv5zWaTEVkI","dMH0bHeiRNg","b3u65f4CRLk"]7AVHXe-ol-s internmarket    603 ["Music"]   264 60349673    3.16    1033    594 ["trrRo3kGRv0","CCsGkN1PEec","l5mQKJDO-nY","DP_4rQcU0G8","7xz5aOvP2YA","kCjKIus3iBA","nD0MILEXZP0","IQcyLMa716k","d05KUL8aW4E","szeeHnu2DM8","7b_wObF6vgs","UazqVaOg9uc","LRBD9l3Nv6Y","108YigkYFgM","_qCau44yfX0","NkkGj4_1m9A","kuyRSrklGU8","uLr7IJYyNnM","IEuyk_277xY","XmAaYo-CQNw"]244qR7SvvX0 donotasyoudo    960 ["Entertainment"]   6   57790943    1.45    66412   14913   ["v3ARyAb_1Bs","nhSZs-aAZbo","2pNTrYd-4FQ","dMH0bHeiRNg","kHmvkRoEowc","yh0xYO3dYx8","5GE82tqcYYQ","ktUSIJEiOug","6PrDw6T3-rM","-xEzGIuY7kw","innfyQZHPpo","pv5zWaTEVkI","E7QOUJfRs3Y","9oRQJK61MWs"]ePyRrb2-fzs 1988basti   826 ["Music"]   204 45984219    4.87    51039   27065   ["fm0T7_SGee4","h8oBykb_Pqs","iWg3IMN_rhU","MdOAr_4FJvc","xNp7OxgFJJM","nzaw_Gk9BAU","u9rl4apDFfY","fMzMm4sJYGo","b5eEa5a2Rio","aNR0tW_afdk","Kj_MceyjS6g","GojTUmjxVHU","CGPUuPHdHQg","SIM4DCn7AlE","ktUSIJEiOug","Dn6kGzRSjhU","gxxU68z8quM","HRSrFm_8YK8","2__Qdd11rfA","XewBty95JVg"]xsRWpK4pf90 universalmusicgroup 902 ["Music"]   240 44614530    4.73    99373   53441   ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]ktUSIJEiOug aliciakeys  953 ["Music"]   251 43583367    4.85    103074  59672   ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]b3u65f4CRLk srcrecords  729 ["Music"]   256 43511791    4.76    55148   27818   ["4fZF9UsS8LY","5f7kfhHHH_A","H1vaszd6NnA","6Di-dAIR9RA","HiBcQeIax4g","JVciSFc5Wrw","Md6rURKhZmA","IG5ReXP0SSg","D9g2szHsoz0","EflOarvoeVs","7PxBGHjABnU","Gjq1g3j7WFc","mDvCcrU-Ob8","V9YE8dHMxvw","iWg3IMN_rhU","Un2dbprtZFE","WpgMuHwAdC4","RtQ5JENdx5I","ZEYgAgKVuO4","D_2jb8D8AOI"]Time taken: 172.566 seconds, Fetched: 10 row(s)

4.2 统计出视频类别热度的前10个类型

select tagId, count(a.videoid) as sum from (select videoid,tagId from youtube3 lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc limit 10;

结果:

hive> select tagId, count(a.videoid) as sum from (select videoid,tagId from youtube3 lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc limit 10;Query ID = hadoop_20170710155757_614ec9dd-ffa0-465d-8d62-c47b5ad585f0Total jobs = 2Launching Job 1 out of 2Number of reduce tasks not specified. Defaulting to jobconf value of: 2Starting Job = job_1499153664137_0102, Tracking URL = http://master:8088/proxy/application_1499153664137_0102/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0102Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 22017-07-10 15:58:23,797 Stage-1 map = 0%,  reduce = 0%...2017-07-10 15:59:16,341 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 19.51 secMapReduce Total cumulative CPU time: 19 seconds 510 msecEnded Job = job_1499153664137_0102Launching Job 2 out of 2Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1499153664137_0103, Tracking URL = http://master:8088/proxy/application_1499153664137_0103/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0103Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 12017-07-10 15:59:51,645 Stage-2 map = 0%,  reduce = 0%2017-07-10 16:00:18,373 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 0.97 sec2017-07-10 16:00:48,410 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 2.57 secMapReduce Total cumulative CPU time: 2 seconds 570 msecEnded Job = job_1499153664137_0103MapReduce Jobs Launched: Stage-Stage-1: Map: 4  Reduce: 2   Cumulative CPU: 19.51 sec   HDFS Read: 47327614 HDFS Write: 900 SUCCESSStage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 2.57 sec   HDFS Read: 5596 HDFS Write: 148 SUCCESSTotal MapReduce CPU Time Spent: 22 seconds 80 msecOKEntertainment   1304724Music   1274825Comedy  449652Blogs   447581People  447581Film    442109Animation   442109Sports  390619Politics    186753News    186753Time taken: 185.399 seconds, Fetched: 10 row(s)

4.3 统计出视频观看数最高的50个视频的所属类别

select tagId, count(a.videoid) as sum from (select videoid,tagId from (select * from youtube3 order by views desc limit 20) e lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc;

结果:

hive> select tagId, count(a.videoid) as sum from (select videoid,tagId from (select * from youtube3 order by views desc limit 20) e lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc;Query ID = hadoop_20170710160909_c6cbbe29-4df3-4c0b-ad70-bd34857acc80Total jobs = 3Launching Job 1 out of 3Number of reduce tasks determined at compile time: 1Starting Job = job_1499153664137_0104, Tracking URL = http://master:8088/proxy/application_1499153664137_0104/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0104Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 12017-07-10 16:09:48,197 Stage-1 map = 0%,  reduce = 0%2017-07-10 16:10:17,734 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 3.09 sec...2017-07-10 16:10:59,048 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 47.74 secMapReduce Total cumulative CPU time: 47 seconds 740 msecEnded Job = job_1499153664137_0104Launching Job 2 out of 3Number of reduce tasks not specified. Defaulting to jobconf value of: 2In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1499153664137_0105, Tracking URL = http://master:8088/proxy/application_1499153664137_0105/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0105Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 22017-07-10 16:11:35,860 Stage-2 map = 0%,  reduce = 0%2017-07-10 16:12:04,633 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 1.07 sec2017-07-10 16:12:23,187 Stage-2 map = 100%,  reduce = 50%, Cumulative CPU 2.61 sec2017-07-10 16:12:32,480 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 3.93 secMapReduce Total cumulative CPU time: 3 seconds 930 msecEnded Job = job_1499153664137_0105Launching Job 3 out of 3Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1499153664137_0106, Tracking URL = http://master:8088/proxy/application_1499153664137_0106/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0106Hadoop job information for Stage-3: number of mappers: 2; number of reducers: 12017-07-10 16:13:09,244 Stage-3 map = 0%,  reduce = 0%2017-07-10 16:13:37,263 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.18 sec2017-07-10 16:14:05,048 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 3.5 secMapReduce Total cumulative CPU time: 3 seconds 500 msecEnded Job = job_1499153664137_0106MapReduce Jobs Launched: Stage-Stage-1: Map: 4  Reduce: 1   Cumulative CPU: 47.74 sec   HDFS Read: 57839263 HDFS Write: 228 SUCCESSStage-Stage-2: Map: 1  Reduce: 2   Cumulative CPU: 3.93 sec   HDFS Read: 6065 HDFS Write: 324 SUCCESSStage-Stage-3: Map: 2  Reduce: 1   Cumulative CPU: 3.5 sec   HDFS Read: 6600 HDFS Write: 53 SUCCESSTotal MapReduce CPU Time Spent: 55 seconds 170 msecOKMusic   12Comedy  3Entertainment   3Film    2Animation   2Time taken: 297.466 seconds, Fetched: 5 row(s)

4.4 统计出观看数最多的前50个视频所关联的视频的所属类别排行

思路:

  1. 首先筛选出前50个视频所关联的视频,
    select * from youtube3 order by views desc limit 50
  2. 再将结果和youtube3进行join
    select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId
  3. 然后去重
    select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId
  4. 最后得出所有视频的类别排行
    select tagId, count(e.videoid) as sum from (select videoid,tagId from (select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId) d lateral view explode(category) catetory as tagId) e group by tagId order by sum desc;

结果:

hive> select tagId, count(e.videoid) as sum from (select videoid,tagId from (select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId) d lateral view explode(category) catetory as tagId) e group by tagId order by sum desc;Query ID = hadoop_20170710170808_cfbfde68-c016-4c5d-860b-fe840a2d50cbTotal jobs = 7Launching Job 1 out of 7Number of reduce tasks determined at compile time: 1Starting Job = job_1499153664137_0111, Tracking URL = http://master:8088/proxy/application_1499153664137_0111/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0111Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 12017-07-10 17:08:48,662 Stage-1 map = 0%,  reduce = 0%2017-07-10 17:09:25,175 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 20.83 sec...2017-07-10 17:10:15,276 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 81.01 secMapReduce Total cumulative CPU time: 1 minutes 21 seconds 10 msecEnded Job = job_1499153664137_0111Stage-10 is filtered out by condition resolver.Stage-11 is selected by condition resolver.Stage-2 is filtered out by condition resolver.17/07/10 17:10:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableExecution log at: /tmp/hadoop/hadoop_20170710170808_cfbfde68-c016-4c5d-860b-fe840a2d50cb.log2017-07-10 05:10:23 Starting to launch local task to process map join;  maximum memory = 5189795842017-07-10 05:10:34 Dump the side-table for tag: 0 with group count: 743 into file: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-10_17-08-08_628_1950388697756865753-1/-local-10009/HashTable-Stage-8/MapJoin-mapfile30--.hashtable2017-07-10 05:10:34 Uploaded 1 File to: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-10_17-08-08_628_1950388697756865753-1/-local-10009/HashTable-Stage-8/MapJoin-mapfile30--.hashtable (22611 bytes)2017-07-10 05:10:34 End of local task; Time Taken: 11.392 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 3 out of 7Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1499153664137_0112, Tracking URL = http://master:8088/proxy/application_1499153664137_0112/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0112Hadoop job information for Stage-8: number of mappers: 4; number of reducers: 02017-07-10 17:11:13,493 Stage-8 map = 0%,  reduce = 0%2017-07-10 17:11:57,229 Stage-8 map = 100%,  reduce = 0%, Cumulative CPU 15.79 secMapReduce Total cumulative CPU time: 15 seconds 790 msecEnded Job = job_1499153664137_0112Launching Job 4 out of 7Number of reduce tasks not specified. Defaulting to jobconf value of: 2Starting Job = job_1499153664137_0113, Tracking URL = http://master:8088/proxy/application_1499153664137_0113/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0113Hadoop job information for Stage-3: number of mappers: 3; number of reducers: 22017-07-10 17:12:31,982 Stage-3 map = 0%,  reduce = 0%2017-07-10 17:13:17,467 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.53 secMapReduce Total cumulative CPU time: 6 seconds 530 msecEnded Job = job_1499153664137_0113Launching Job 5 out of 7Number of reduce tasks not specified. Defaulting to jobconf value of: 2Starting Job = job_1499153664137_0114, Tracking URL = http://master:8088/proxy/application_1499153664137_0114/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0114Hadoop job information for Stage-4: number of mappers: 2; number of reducers: 22017-07-10 17:13:55,123 Stage-4 map = 0%,  reduce = 0%2017-07-10 17:14:22,876 Stage-4 map = 50%,  reduce = 0%, Cumulative CPU 0.91 sec2017-07-10 17:14:23,905 Stage-4 map = 100%,  reduce = 0%, Cumulative CPU 1.99 sec2017-07-10 17:14:40,601 Stage-4 map = 100%,  reduce = 50%, Cumulative CPU 3.35 sec2017-07-10 17:14:52,033 Stage-4 map = 100%,  reduce = 100%, Cumulative CPU 4.96 secMapReduce Total cumulative CPU time: 4 seconds 960 msecEnded Job = job_1499153664137_0114Launching Job 6 out of 7Number of reduce tasks determined at compile time: 1Starting Job = job_1499153664137_0115, Tracking URL = http://master:8088/proxy/application_1499153664137_0115/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0115Hadoop job information for Stage-5: number of mappers: 2; number of reducers: 12017-07-10 17:15:27,647 Stage-5 map = 0%,  reduce = 0%2017-07-10 17:15:54,560 Stage-5 map = 50%,  reduce = 0%, Cumulative CPU 0.91 sec2017-07-10 17:15:55,587 Stage-5 map = 100%,  reduce = 0%, Cumulative CPU 2.03 sec2017-07-10 17:16:23,357 Stage-5 map = 100%,  reduce = 100%, Cumulative CPU 3.59 secMapReduce Total cumulative CPU time: 3 seconds 590 msecEnded Job = job_1499153664137_0115MapReduce Jobs Launched: Stage-Stage-1: Map: 4  Reduce: 1   Cumulative CPU: 81.01 sec   HDFS Read: 463074822 HDFS Write: 26949 SUCCESSStage-Stage-8: Map: 4   Cumulative CPU: 15.79 sec   HDFS Read: 47322670 HDFS Write: 30829 SUCCESSStage-Stage-3: Map: 3  Reduce: 2   Cumulative CPU: 6.53 sec   HDFS Read: 43694 HDFS Write: 1126 SUCCESSStage-Stage-4: Map: 2  Reduce: 2   Cumulative CPU: 4.96 sec   HDFS Read: 8888 HDFS Write: 725 SUCCESSStage-Stage-5: Map: 2  Reduce: 1   Cumulative CPU: 3.59 sec   HDFS Read: 7001 HDFS Write: 207 SUCCESSTotal MapReduce CPU Time Spent: 1 minutes 51 seconds 880 msecOKMusic   399Entertainment   103Comedy  94People  36Blogs   36Film    19Animation   19Pets    12Animals 12UNA 11Style   6Politics    6News    6Howto   6Sports  4Vehicles    3Autos   3Travel  2Events  2Technology  1Science 1Time taken: 496.822 seconds, Fetched: 21 row(s)

4.5 筛选出某个类别(如music)中热度最高的前10个视频

思路:

  • 创建一个表格,存储每个视频对应一个标签的信息
    create table youtube_category(videoId string, uploader string, age int, categoryId string, length int, views int, rate float, ratings int, comments int,relatedId array)
    row format delimited
    fields terminated by “\t”
    collection items terminated by “&”
    stored as orc;
  • 将转换后的数据进行插入
    insert into table youtube_category select videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId from youtube3 lateral view explode(category) catetory as categoryId;
  • 根据观看数和类别进行查询
    select * from youtube_category where categoryId=”Music” order by views desc limit 10;

结果如下:

hive> create table youtube_category(videoId string, uploader string, age int, categoryId string, length int, views int, rate float, ratings int, comments int,relatedId array<string>)    > row format delimited    > fields terminated by "\t"    > collection items terminated by "&"    > stored as orc;OKTime taken: 0.256 secondshive> insert into youtube_category select videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId from youtube3 lateral view explode(category) catetory as categoryId;Query ID = hadoop_20170711091010_7c76d7bd-5af0-43f9-812f-e30c612ee60bTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1499153664137_0116, Tracking URL = http://master:8088/proxy/application_1499153664137_0116/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0116Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 02017-07-11 09:11:18,741 Stage-1 map = 0%,  reduce = 0%...2017-07-11 09:19:05,239 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 402.71 secMapReduce Total cumulative CPU time: 6 minutes 42 seconds 710 msecEnded Job = job_1499153664137_0116Stage-4 is selected by condition resolver.Stage-3 is filtered out by condition resolver.Stage-5 is filtered out by condition resolver.Moving data to: hdfs://bydcluster1/user/hive/warehouse/youtube_category/.hive-staging_hive_2017-07-11_09-10-38_970_8191962088714912782-1/-ext-10000Loading data to table default.youtube_categoryTable default.youtube_category stats: [numFiles=4, numRows=6742209, totalSize=608748677, rawDataSize=10187373874]MapReduce Jobs Launched: Stage-Stage-1: Map: 4   Cumulative CPU: 404.25 sec   HDFS Read: 574634998 HDFS Write: 608749047 SUCCESSTotal MapReduce CPU Time Spent: 6 minutes 44 seconds 250 msecOKTime taken: 511.914 secondshive> select * from youtube_category where categoryId="music" order by views desc limit 10;Query ID = hadoop_20170711091919_a48aa98b-9cfa-4f58-a106-9da85484f0ddTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1499153664137_0117, Tracking URL = http://master:8088/proxy/application_1499153664137_0117/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0117Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 12017-07-11 09:20:37,607 Stage-1 map = 0%,  reduce = 0%...2017-07-11 09:21:32,382 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 36.15 secMapReduce Total cumulative CPU time: 36 seconds 150 msecEnded Job = job_1499153664137_0117MapReduce Jobs Launched: Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 36.15 sec   HDFS Read: 607706126 HDFS Write: 0 SUCCESSTotal MapReduce CPU Time Spent: 36 seconds 150 msecOKTime taken: 96.836 secondshive> select * from youtube_category where catagoryId="Music" order by views desc limit 10;FAILED: SemanticException [Error 10004]: Line 1:37 Invalid table alias or column reference 'catagoryId': (possible column names are: videoid, uploader, age, categoryid, length, views, rate, ratings, comments, relatedid)hive> select * from youtube_category where catagoryid="Music" order by views desc limit 10;FAILED: SemanticException [Error 10004]: Line 1:37 Invalid table alias or column reference 'catagoryid': (possible column names are: videoid, uploader, age, categoryid, length, views, rate, ratings, comments, relatedid)hive> select * from youtube_category where categoryid="Music" order by views desc limit 10;Query ID = hadoop_20170711092525_53f32ccf-7d04-4615-b446-9009cf16dc7fTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1499153664137_0118, Tracking URL = http://master:8088/proxy/application_1499153664137_0118/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0118Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 12017-07-11 09:26:08,727 Stage-1 map = 0%,  reduce = 0%...2017-07-11 09:27:17,297 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 57.89 secMapReduce Total cumulative CPU time: 57 seconds 890 msecEnded Job = job_1499153664137_0118MapReduce Jobs Launched: Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 58.14 sec   HDFS Read: 607706243 HDFS Write: 3043 SUCCESSTotal MapReduce CPU Time Spent: 58 seconds 140 msecOKcQ25-glGRzI RCARecords  742 Music   227 77674728    4.45    188154  186858  ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]7AVHXe-ol-s internmarket    603 Music   264 60349673    3.16    1033    594 ["trrRo3kGRv0","CCsGkN1PEec","l5mQKJDO-nY","DP_4rQcU0G8","7xz5aOvP2YA","kCjKIus3iBA","nD0MILEXZP0","IQcyLMa716k","d05KUL8aW4E","szeeHnu2DM8","7b_wObF6vgs","UazqVaOg9uc","LRBD9l3Nv6Y","108YigkYFgM","_qCau44yfX0","NkkGj4_1m9A","kuyRSrklGU8","uLr7IJYyNnM","IEuyk_277xY","XmAaYo-CQNw"]ePyRrb2-fzs 1988basti   826 Music   204 45984219    4.87    51039   27065   ["fm0T7_SGee4","h8oBykb_Pqs","iWg3IMN_rhU","MdOAr_4FJvc","xNp7OxgFJJM","nzaw_Gk9BAU","u9rl4apDFfY","fMzMm4sJYGo","b5eEa5a2Rio","aNR0tW_afdk","Kj_MceyjS6g","GojTUmjxVHU","CGPUuPHdHQg","SIM4DCn7AlE","ktUSIJEiOug","Dn6kGzRSjhU","gxxU68z8quM","HRSrFm_8YK8","2__Qdd11rfA","XewBty95JVg"]xsRWpK4pf90 universalmusicgroup 902 Music   240 44614530    4.73    99373   53441   ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]ktUSIJEiOug aliciakeys  953 Music   251 43583367    4.85    103074  59672   ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]b3u65f4CRLk srcrecords  729 Music   256 43511791    4.76    55148   27818   ["4fZF9UsS8LY","5f7kfhHHH_A","H1vaszd6NnA","6Di-dAIR9RA","HiBcQeIax4g","JVciSFc5Wrw","Md6rURKhZmA","IG5ReXP0SSg","D9g2szHsoz0","EflOarvoeVs","7PxBGHjABnU","Gjq1g3j7WFc","mDvCcrU-Ob8","V9YE8dHMxvw","iWg3IMN_rhU","Un2dbprtZFE","WpgMuHwAdC4","RtQ5JENdx5I","ZEYgAgKVuO4","D_2jb8D8AOI"]iWg3IMN_rhU TimbalandMusic  853 Music   216 43323757    4.8 58761   36142   ["SIM4DCn7AlE","GojTUmjxVHU","ePyRrb2-fzs","2__Qdd11rfA","P_TKpULdDvo","Dn6kGzRSjhU","1iLDIj0pDHk","xsRWpK4pf90","XewBty95JVg","y4jiyjDQGLY","h8oBykb_Pqs","xNp7OxgFJJM","b3u65f4CRLk","S-783rzHIS4","cZd1Js0QaOI","kZGEgVxyPHU","43o0vwAmFM8","BMMUWvavORI","v_-1peCW6Ok","9gkjyM7ZOUc"]innfyQZHPpo chai0322    468 Music   210 41564032    2.61    13564   6126    ["nhSZs-aAZbo","hh0nVc0NYTE","12Z3J1uzd0Q","V1s9queYhF8","1Al4crLW9EM","61e1h4vALS0","8h98jb9Lk74","Z-HjmP7BCVc","o_pIQIV_NuU","82BDV1gYjdM","Af9aPlG2WFw","cC27PTFP4V8","-_CSo1gOd48","00c08ijIlHo","cIKxlWuTviE","dMH0bHeiRNg","iWg3IMN_rhU","v3ARyAb_1Bs","5P6UU6m3cqk","QjA5faZF1A8"]Lt6o8NlrbHg seankingston    867 Music   257 41171303    4.74    101352  67579   ["qwflMOAaOf0","aVB5ViG_0yE","_QmajsQI9SM","Skz6h2gb-t8","t2dkCGDgVzk","HJgsbsMSe5w","BXhN2DbI0hc","sFJQAfW_jgw","RZJ32D-lrTE","U1bf7XLGGRE","ktUSIJEiOug","TWEez0U-9ag","BhkIjh-nuRo","a4X7eFbP3u4","b3u65f4CRLk","sm2fTDpuyyM","9pzro0T8rgY","TBzfqR4mrgE","c2cyose42jU","iWg3IMN_rhU"]QjA5faZF1A8 guitar90    308 Music   320 40294882    4.83    329108  169563  ["r2BOApUvFpw","ATub40Npxik","m7Jh1BV1EOc","owAj5LiXG5w","Ddn4MGaS3N4","GxplDa3M5Io","by8oyJztzwo","aZpD0btOZx8","pZ9jrBg4Lwc","heISA256CRo","6wpPk8qk3uQ","i4BYMvVvMg0","2xjJXT0C0X4","p-VR-cXghko","-9ao_vOsZkg","wfqu4YEufYc","WetVXbYRfWk","JdxkVQy7QLM","fdjamy75C4A","dVUgd8ot6BE"]Time taken: 111.631 seconds, Fetched: 10 row(s)

4.6 筛选出每个类别中评分最高的前10个视频

select * from youtube_category where categoryId=”Music” order by ratings desc limit 10;

结果如下:

hive> select * from youtube_category where categoryId="Music" order by ratings desc limit 10;Query ID = hadoop_20170711093838_6ab5573a-964f-486e-b0dc-77e5853fcfb5Total jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1499153664137_0119, Tracking URL = http://master:8088/proxy/application_1499153664137_0119/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0119Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 12017-07-11 09:39:27,903 Stage-1 map = 0%,  reduce = 0%...2017-07-11 09:40:36,298 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 57.41 secMapReduce Total cumulative CPU time: 57 seconds 410 msecEnded Job = job_1499153664137_0119MapReduce Jobs Launched: Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 57.41 sec   HDFS Read: 607706243 HDFS Write: 2957 SUCCESSTotal MapReduce CPU Time Spent: 57 seconds 410 msecOKQjA5faZF1A8 guitar90    308 Music   320 40294882    4.83    329108  169563  ["r2BOApUvFpw","ATub40Npxik","m7Jh1BV1EOc","owAj5LiXG5w","Ddn4MGaS3N4","GxplDa3M5Io","by8oyJztzwo","aZpD0btOZx8","pZ9jrBg4Lwc","heISA256CRo","6wpPk8qk3uQ","i4BYMvVvMg0","2xjJXT0C0X4","p-VR-cXghko","-9ao_vOsZkg","wfqu4YEufYc","WetVXbYRfWk","JdxkVQy7QLM","fdjamy75C4A","dVUgd8ot6BE"]cQ25-glGRzI RCARecords  742 Music   227 77674728    4.45    188154  186858  ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]pv5zWaTEVkI OkGo    531 Music   184 32022043    4.83    121516  39974   ["dMH0bHeiRNg","x49WZRyXGe0","RbdbVhBGETQ","yRmqZRPgK1w","DjCL0_0Il7w","gq7r3F1SoX0","k66epna2Sss","1LNbzqoOPu4","vr3x_RRJdd4","ERV-wh4VwZI","xsRWpK4pf90","iAQZ_uui1SY","5P6UU6m3cqk","bav63MWNUKg"]ktUSIJEiOug aliciakeys  953 Music   251 43583367    4.85    103074  59672   ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]Lt6o8NlrbHg seankingston    867 Music   257 41171303    4.74    101352  67579   ["qwflMOAaOf0","aVB5ViG_0yE","_QmajsQI9SM","Skz6h2gb-t8","t2dkCGDgVzk","HJgsbsMSe5w","BXhN2DbI0hc","sFJQAfW_jgw","RZJ32D-lrTE","U1bf7XLGGRE","ktUSIJEiOug","TWEez0U-9ag","BhkIjh-nuRo","a4X7eFbP3u4","b3u65f4CRLk","sm2fTDpuyyM","9pzro0T8rgY","TBzfqR4mrgE","c2cyose42jU","iWg3IMN_rhU"]xsRWpK4pf90 universalmusicgroup 902 Music   240 44614530    4.73    99373   53441   ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]K2cYWfq--Nw FrEckleStudios  841 Music   224 17005186    4.82    90991   50788   ["SyIC3Munnyw","cZd1Js0QaOI","lLYD_-A_X5E","alqM0IYeH54","wQVEPFzkhaM","oGECJP3phyY","nPLOiBM8hLk","VpBDqtUEWcM","MJPdVVOmbz4","bPZJYQXQsm8","nPBmXEO3yUU","3jzSh_MLNcY","_EXeBLvmllg","Cva_sGN_0VA","Sr2JneittqQ","SYpYmkcadRA","71eBjNbdXDg","DgBgnoEY4iM","bl6RJyZdBSU","FAK_jtOf70g"]-xEzGIuY7kw alyankovic  580 Music   171 24095019    4.84    90091   45517   ["HYokLWfqbaU","N26KWq7MmSc","JCAt9WcCFbM","Rt1_6uz_sVU","Nh9mVsBKwYs","p9Zt8mn14hY","8n7ncJEFuSw","FT060JGp9sQ","E6Zc9NyYH-k","ixyTNd-Ln38","zIllRdSzSug","GsfVw9xxoNY","XkDeJgGrtdU","fqz1ojIQTBk","5GE82tqcYYQ","ODdGhOOUOpI","v3ARyAb_1Bs","XbVtbc_XzrI","U1ULxKM75rY","Jw00EUh0GT4"]EwTZ2xpQwpA TayZonday   796 Music   292 16841569    4.23    83514   129200  ["2x2W12A8Qow","P6dUCOS1bM0","NattlyH0IeM","nTQOpibv_OA","9mSKBgvHdoE","hjD6iigdB-g","caIBKOztlAo","xUz2YMmiq0k","aWY3eYOX3U0","mD5_GUovjiM","1oFS-q8BIps","deXAEN70CDY","0pElTyjfxe0","eyDuGwlrFRs","m6SjPfc_xNA","qYGvGWY1FDs","N0amCfgnwY8","JPu4uErBFks","pgSA-ErKd8c","ZZgGGlOGyUg"]xWHf_vYZzQ8 universalmusicgroup 771 Music   262 25607299    4.79    83419   70529   ["jvz0bvYmnto","zElEs8yw7fw","9FcBnaLjxY4","95wgKdSJGDo","ueOgZPBXY4A","k3O5uy-MBBk","O3sGTaQ9s9c","aT434G38OBg","4PdDPrwIwhI","Y-VifE8EK8w","9wpxno6qUd0","B17SYROT3GI","wJtUuxmm-B0","LBDnkJ5h1ho","CfzpBjKpSPE","QF2AmC2xyXM","eoa6Gx4HxTc","mvPvcV44rCc"]Time taken: 109.053 seconds, Fetched: 10 row(s)

4.7 找出用户中上传视频最多的10个用户的所有视频

思路:

  • 筛选出用户表中上传视频数最多的前十位
    select * from user order by videos desc limit 10;
  • 将筛选结果跟youtube3进行join得出结果
    select b.,a.videos,a.friends from (select from user order by videos desc limit 10) a join youtube3 b on a.uploader = b.uploader order by views desc limit 20;

结果如下:

hive> select b.*,a.videos,a.friends from (select * from user order by videos desc limit 10) a join youtube3 b on a.uploader = b.uploader order by views desc limit 20;Query ID = hadoop_20170711101616_d9ebb69e-7fbe-4f09-ad55-54f4dfd11ebeTotal jobs = 5Launching Job 1 out of 5Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1499153664137_0125, Tracking URL = http://master:8088/proxy/application_1499153664137_0125/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0125Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12017-07-11 10:17:30,594 Stage-1 map = 0%,  reduce = 0%2017-07-11 10:18:03,439 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.03 sec2017-07-11 10:18:25,025 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 10.28 secMapReduce Total cumulative CPU time: 10 seconds 280 msecEnded Job = job_1499153664137_0125Stage-8 is filtered out by condition resolver.Stage-9 is selected by condition resolver.Stage-2 is filtered out by condition resolver.17/07/11 10:18:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableExecution log at: /tmp/hadoop/hadoop_20170711101616_d9ebb69e-7fbe-4f09-ad55-54f4dfd11ebe.log2017-07-11 10:18:33 Starting to launch local task to process map join;  maximum memory = 5189795842017-07-11 10:18:44 Dump the side-table for tag: 0 with group count: 10 into file: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-11_10-16-50_653_2968482866464413222-1/-local-10007/HashTable-Stage-6/MapJoin-mapfile90--.hashtable2017-07-11 10:18:44 Uploaded 1 File to: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-11_10-16-50_653_2968482866464413222-1/-local-10007/HashTable-Stage-6/MapJoin-mapfile90--.hashtable (611 bytes)2017-07-11 10:18:44 End of local task; Time Taken: 11.244 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 3 out of 5Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1499153664137_0126, Tracking URL = http://master:8088/proxy/application_1499153664137_0126/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0126Hadoop job information for Stage-6: number of mappers: 4; number of reducers: 02017-07-11 10:19:25,285 Stage-6 map = 0%,  reduce = 0%...2017-07-11 10:20:26,162 Stage-6 map = 100%,  reduce = 0%, Cumulative CPU 44.09 secMapReduce Total cumulative CPU time: 44 seconds 90 msecEnded Job = job_1499153664137_0126Launching Job 4 out of 5Number of reduce tasks determined at compile time: 1Starting Job = job_1499153664137_0127, Tracking URL = http://master:8088/proxy/application_1499153664137_0127/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0127Hadoop job information for Stage-3: number of mappers: 2; number of reducers: 12017-07-11 10:21:01,746 Stage-3 map = 0%,  reduce = 0%2017-07-11 10:21:30,462 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 4.27 sec2017-07-11 10:21:48,968 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.36 secMapReduce Total cumulative CPU time: 6 seconds 360 msecEnded Job = job_1499153664137_0127MapReduce Jobs Launched: Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 10.28 sec   HDFS Read: 9796823 HDFS Write: 434 SUCCESSStage-Stage-6: Map: 4   Cumulative CPU: 44.09 sec   HDFS Read: 574641054 HDFS Write: 2133846 SUCCESSStage-Stage-3: Map: 2  Reduce: 1   Cumulative CPU: 6.36 sec   HDFS Read: 2144607 HDFS Write: 6335 SUCCESSTotal MapReduce CPU Time Spent: 1 minutes 0 seconds 730 msecOKlUOe76YPY7M expertvillage   801 ["Howto","Style"]   119 1014983 3.36    596 689 ["5cEsa-rswrg","HOoQPbcF8Rk","ht3DMDx1spo","VFnf17cp3Eg","IP35rm_31RQ","hXuiuwZPX0M","s8pH5jgSuP8","KIEkwm_6DD4","ovFSahIPaVQ","rQtwcSDh3Hk","COeT3WR7SFc","ZNLBKRpWt7w","qncPBZ9DRRk","tOBqg3yDlyE","YgfWUXsbr7w","qBB3AsgjN1M","XX8ouWx6xm8","BtF9dYRuiGU","NRanI5qR81I","evfl0NGLjVE"]   86228   56592Aj2wx9PYxI expertvillage   815 ["Howto","Style"]   170 918553  3.82    1596    1533    ["hb5QaCfm7bg","YVDPSZe21KI","C1elpMjZ4wE","8gD1RG-tnNk","d-IDrRSVkCQ","IAOr6SYGAyw","3mbx03mP5eg","o-pN8qAiZhQ","K_qw03-3gFg","JgsO3A_vbtk","ffKr98Ium-M","WKYcBKa8_7Q","dgeX8CZmqvo","jgUCs72RDHs","PaNhROW-wEo","WhRCVm-1r2k","5I5O8P-r5Rk","reJSIZ3ugsE","-51iHvqP0Rs","9wItsn3r_kc"]   86228   5659VBGHer3yFyc libertaddigitaltv   998 ["News","Politics"] 29  828616  4.68    1337    10763   ["HL9p8I3lOFA","Fb3HZ7K1gTk","AARX6XOA5h0","t3DPDKbRxio","VAmz8MNZdlk","WrC6_uBe4eA","d9X8DYOA5DE","pZuZv8UK7YA","NwswzwoA4pA","g1J2gvjp4fs","3SMhrQZdABc","h6M42Il4kN8","izM_JwEkWPs","_FM_IoPIFq4","uMZCCPtl_Do","5cZFinVFubQ","om0iHwO4d6o","lUZxlXkbaxM","iSJ3qKpC-3o","tRdq9_Si0Ws"]   6874    107jnqD8wvyE expertvillage   776 ["Howto","Style"]   217 574365  4.55    1073    1526    ["3wNpOp50uGI","-taU9d26wT4","_kly-fVUi1I","kZtRmnly_sU","LKe50AJvhz8","0GAUnuuBkW4","eM6Bpz6-dio","i9xf62PKC5M","mwFfk7igYC0","ilLTA4p4MMw","qyoLuTjguJA","aC-KOYQsIvU","0V6LWQZjYRk","_uZkvzYEXp0","auQbi_fkdGE","qgiUSEpg8Xc","4I79yc3uKfE","7Lz-XGjynN8","taHOwsdQXWI","2Aj2wx9PYxI"]   86228   5659NDwZcLGekE8 expertvillage   801 ["Howto","Style"]   94  491582  4.35    553 588 ["CEmYX76UJy0","0sF-PoXR9ZU","qDfoNYpozxk","T4P3yp6mFfY","rzcj_Wq6BIE","A2LTVhqHAdo","4gyH0mJPqY8","m3PsQCMz3sY","wckYj_PAhgM","YkvwdM9Xstc","Ph2UAjGfeB4","vT_uXPfSAVE","89mAouUu8YM","0t5pPZ4AXX0","sLNFN75CYbw","QQGHXEPfMcQ","p6gcz4hkOmw","qk6R-X-YmUw","NR_9VlHFOu8","0gA_3BAxtVM"]   86228   5659CEmYX76UJy0 expertvillage   801 ["Howto","Style"]   90  422284  4.48    448 459 ["NDwZcLGekE8","jRAKgd50Jh0","0sF-PoXR9ZU","IjjJWXgqWL4","m3PsQCMz3sY","8Di58l1dfPA","DpEVqy954OY","dmYMeImNy1s","qk6R-X-YmUw","wckYj_PAhgM","qZSD3JLPURk","YkvwdM9Xstc","fdSFh_z0HJY","n5LDyWyHJTo","Dmm5O1DPQYc","A2LTVhqHAdo","I2mM0r76b7o","iE16Z9Gh3TQ","rzcj_Wq6BIE","8szPfWiK-ag"]   86228   5659E6VWi7IaroA expertvillage   946 ["Howto","Style"]   228 338889  4.46    300 251 ["fLy3M2jfe4w","cC1FBmbYgMk","xEJhInSxsQg","RxMLoqzKZ8s","Z9a--4RQ9O8","7neOXKLrgzk","JS-lS9ngMtY","aKIFfgPLY8I","EsqPNhtkWIo","NJUPNtZN2rc","vV7pajY0zOk","GTtZapaRDbg","-0UVDD6ZwYE","B9jNaUP59Vo","2QDGPe50E4Y","lJDZO_pX4aw","wuD2CemlvP0","ZTx2ZFI1spg","CzZghYdupiQ","5ueEKSJ8J-U"]   86228   56590sF-PoXR9ZU expertvillage   801 ["Sports"]  77  326365  4.21    512 663 ["24FJ58Q22D4","m3PsQCMz3sY","qk6R-X-YmUw","CEmYX76UJy0","NDwZcLGekE8","QiIunH47qew","I2mM0r76b7o","c_qSPdLPReM","5eYBaQ5Cg-c","fdSFh_z0HJY","8szPfWiK-ag","Dmm5O1DPQYc","A2LTVhqHAdo","n-Cvzk84X8o","T4P3yp6mFfY","1vo3CCBIcaY","fnGFR1AFCJc","n5LDyWyHJTo","YkvwdM9Xstc","emNfLmLTQb0"]    86228  5659ohdiRHLSw5Y expertvillage   447 ["Howto","Style"]   78  321521  3.79    139 102 ["12_nJamoyTk","dXLhjYgMZ68","ukzFwRoFj4k","sOUgrJo2kIg","HhO39nCDfMg","3iVP0tzwhVc","MFNA0PqLynY","ZlcMzLhiBjg","7F8ajh_DDYs","6vaPIz6S6sk","-L_uhGXOtF0","-libzR5AV58","CSdSH7XKTtQ","KZX5jXPAWIk","Vkj5fbrQpNs","YCgnFIk5Acg","hjPDmVf6KJw","BZnhMl85dq4","iuqVpMdb1NM","GO2_3q6euug"]   86228   56590sF-PoXR9ZU expertvillage   801 ["Howto","Style"]   77  294737  4.21    438 599 ["24FJ58Q22D4","m3PsQCMz3sY","CEmYX76UJy0","NDwZcLGekE8","A2LTVhqHAdo","qk6R-X-YmUw","5eYBaQ5Cg-c","fdSFh_z0HJY","jGdXcitOUzY","n-Cvzk84X8o","wckYj_PAhgM","I2mM0r76b7o","QMZaxtjhZ5k","Dmm5O1DPQYc","vT_uXPfSAVE","8szPfWiK-ag","QiIunH47qew","YkvwdM9Xstc","DpEVqy954OY","c_qSPdLPReM"]   86228   56593Xc-kxSznZ0 expertvillage   430 ["Howto","Style"]   107 292329  3.33    565 964 ["9YAKfkVvvEc","wEkcPjBaHjs","zMl3ixv1kHw","Q4ZUPEgbzu0","-jIEGZwLPvo","YxdxGLBCCRA","Jp0LaJ_ftT0","PgCRWvwdFHM","yZL2MOFk6I0","qDJk3-ofbk0","e4nHAAuDiPE","cuRk9bTbU6A","z53cPMciVio","MREw0dIWaQ0","237AQAvXtw8","a3dnOupmt7Y","3-va5g_QVss","mFdaiY8YhEY","D6li4tYmKf0","bEVHBHKqBfQ"]   86228   5659VxYkdycN3WE expertvillage   801 ["Howto","Style"]   77  257702  4.19    153 137 ["ZpCE7mnYJK0","NYDdCGtbr8E","904S9W9AZg8","Rl5KEODnCfI","7-i2gl9288I","TC1XlhRwhsU","rD_4nCKf5Ns","pFKZZ120fyU","oN4Esih54V8","N1Ov6cCUXkc","rDM97S4jPiw","IK2uLNND3i0","OZXbM6kRuuA","5pvyXkFTa18","bExUJAnCLZw","VoREkZvkyI8","BWhfpzAItx4","EEybT3IeyzA","QteH0KOJx0g","dbXHvd_SGkk"]   86228   5659-libzR5AV58 expertvillage   447 ["Howto","Style"]   391 249682  3.34    73  62  ["-L_uhGXOtF0","4LvTYBbgMjE","-wO07skEauo","CSdSH7XKTtQ","kzQDoXKDgTM","e1kzLj-FZ6k","Guhgb3pWRAQ","KIbyySDSGEc","DGUBzs-GNxE","KZX5jXPAWIk","ohdiRHLSw5Y","6vaPIz6S6sk","5MrAuq_F2dU","gOv-yPqZ3LE","BZnhMl85dq4","oCDdDCqygOg","12_nJamoyTk","7F8ajh_DDYs","HhO39nCDfMg","X3-6IT-7S80"]   86228   5659HL9p8I3lOFA libertaddigitaltv   998 ["News","Politics"] 17  235313  4.61    114 581 ["VBGHer3yFyc","Fb3HZ7K1gTk","Z3Pe8ff37gk","t3DPDKbRxio","pZuZv8UK7YA","_FM_IoPIFq4","1P_EQjd7lq4","yEzNx2g7ors","_neKPvLdG8Y","7i_WDprxACU","zCM0nsym0cE","om0iHwO4d6o","h6M42Il4kN8","Bjk1McjrWtg","VAmz8MNZdlk","-t2hwljZmA8","gH6PHFrA6g8","ysAwzxqbWD8","U5lX_EIFOOA","4zu_X_3qHLA"]   6874    1oE5ZhBHy6Rs expertvillage   776 ["Howto","Style"]   143 234551  4.64    358 280 ["rCWRUtqJgzI","476vNb6thyM","eg3eo8x_9rw","grK43Poye1U","HGncDLMNPRI","iylaKSDfLSc","JFqiDcvRW2Y","XrbXGRLIDTc","NmxoR9lsu-c","JW8FJefw82k","n0Mi550FDng","QXsW44Wh6tY","6oJDYT5MlEA","IJy05ssELhg","TvT-M7tTQHE","GYNVEWCO_X0","5pJ_o19GVRA","q_Nn_CnvD9s","uqpQNkFxVAU","KKP8vWeEc30"]   86228   5659wOnP_oAXUMA expertvillage   429 ["Howto","Style"]   120 228842  3.56    236 601 ["BlDWdfTAx8o","WF1kRLJ_4B8","6bu9csQC45c","xTAYAl4g7HE","871VTEF7qIY","UheCchftswc","TKg_cdwq9l4","ADQ4Bjq42wE","cSJCDcAKShA","SYklbxHP2tI","fktuPPNkQpQ","L6267-JZu3E","_IwJ3Aj2XrQ","hf-4ppNmH-U","aO8evzfTR8E","E-kNUEv0YgA","Hsr1-xcXFL8","VCiLZMjo_PQ","rMEgKgZOJi8","YeYU_OE5oIU"]   86228   5659MVPCc4eODys expertvillage   448 ["Howto","Style"]   69  225879  4.29    171 315 ["iPPgmvcsJNw","xc8sysT31mQ","svt_fnKE_80","kBjUDCyDCuI","ysa50-plo48","yutDlQL9Ct0","gCra4qOrjFw","CN1QJDGinwE","dFJjaj7pXsA","LvJzFdcYSag","4I79yc3uKfE","AcryZ1o4RQE","WyyB0M6wAV8","A0sOVKbYR20","8XCyd0XEejc","5BmVKKpu_CU","8_QNzZ9WPRo","pNiX_l-HEGM","CQJSZs-euZU","ZwHHYzK8UCg"]   86228   56597neOXKLrgzk expertvillage   946 ["Howto","Style"]   111 215943  4.07    215 314 ["xEJhInSxsQg","PSR5kinMX3Y","E6VWi7IaroA","5ovGW0CzQp4","DeMcNhz0Vz8","CzZghYdupiQ","72EYsQEw35k","tScm-eZInBE","Hz4lnt7d88s","dgLUbXSqwSs","G3P4VFrB_zM","pU0O2-o67C0","pDGP8Q3Zzb8","hQhdK8l6CuY","VK9VflDsunI","LWgzG8I9bWY","csL7WNTWK9U","LE93Q5k5-fI","Uirspi_t3xM","wGLNJK4zcdE"]   86228   5659AaO1aLwk53Y expertvillage   443 ["Howto","Style"]   156 204450  4.14    210 290 ["Q8YYZGvKM-8","Bn59zha-uAQ","BjnkKuI-YAY","aQeXHXkL6ow","Qtp2ibwd2Ms","nstbrkjk3OM","Odj4ATUPh9w","v2mm2-9JQ-I","328YVJVtMKU","B-jzkyKxDLo","yB-yGNxNEZg","b0msCCnzmNA","sfh5AJhRto8","SEILiSkGzrY","jm4uxgVDU6o","svOBjuvvy-k","y_1nj8fBQOE","zb8ArkvxOxo","VgWayeJdV0w","Yor1dHH6orE"]   86228   5659m3PsQCMz3sY expertvillage   801 ["Howto","Style"]   99  203912  4.38    177 207 ["CEmYX76UJy0","0sF-PoXR9ZU","NDwZcLGekE8","qk6R-X-YmUw","DpEVqy954OY","Ph2UAjGfeB4","fdSFh_z0HJY","YkvwdM9Xstc","8szPfWiK-ag","Dmm5O1DPQYc","T4P3yp6mFfY","I2mM0r76b7o","7GtVwKZBf9U","mugebyUqK6o","qDfoNYpozxk","XA7dRqkp8YA","a28QcMXjHyI","n5LDyWyHJTo","ovoNU_CbmfA","Q2j6MlACIoc"]   86228   5659Time taken: 300.414 seconds, Fetched: 20 row(s)

4.8 筛选出每个类别中观看数Top10

select a.* from (select videoId,categoryId,views,row_number() over(partition by categoryId order by views desc) rank from youtube_category) a where rank<=10;

hive> select a.* from (select videoId,categoryId,views,row_number() over(partition by categoryId order by views desc) rank from youtube_category) a where rank<=10;Query ID = hadoop_20170713170101_76ffdd80-e72c-4c7f-b9ad-9b8c3f7e17abTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 3Starting Job = job_1499153664137_0143, Tracking URL = http://master:8088/proxy/application_1499153664137_0143/Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0143Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 32017-07-13 17:01:52,690 Stage-1 map = 0%,  reduce = 0%2017-07-13 17:02:24,374 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 10.49 sec...2017-07-13 17:03:08,803 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 89.07 secMapReduce Total cumulative CPU time: 1 minutes 29 seconds 70 msecEnded Job = job_1499153664137_0143MapReduce Jobs Launched: Stage-Stage-1: Map: 3  Reduce: 3   Cumulative CPU: 89.07 sec   HDFS Read: 73070026 HDFS Write: 7487 SUCCESSTotal MapReduce CPU Time Spent: 1 minutes 29 seconds 70 msecOKqruSOZq-wJg Activism    2673823 1tFxk7glmMbo Activism    1063928 2dYN278GB2Kg Activism    831836  3n-Akqik3hME Activism    778987  4LtqRAYWjz2Q Activism    768422  52v-whrgAbf4 Activism    742172  6ja4d02s0WKQ Activism    733014  7IpAOYskH1s8 Activism    714580  8ieuVTOJyLBs Activism    640994  96CXBgbP_ZKs Activism    562163  1012Z3J1uzd0Q Animation   65341925    1bFytHZHFXhA Animation   38937813    2sdUUx5FdySs Animation   16151661    3vQbYvjmfbr4 Animation   12114231    4rNKefRRvV7g Animation   11884311    56HIavxnUHls Animation   11735507    6gm5DHKI8o5o Animation   9381674 7316BF17k5d8 Animation   9372064 8ZuQMn6Z0k_w Animation   9195588 96B26asyGKDo Animation   8915176 10dMH0bHeiRNg Comedy  79897120    15P6UU6m3cqk Comedy  42525795    2Tx1XIm6q4r4 Comedy  38378910    3Q5im0Ssyyus Comedy  21170471    4k66epna2Sss Comedy  17944367    5AYxu_MQSTTY Comedy  15665340    6pYak2F1hUYA Comedy  15634357    7_OBlgSz8sSM Comedy  14567743    8wCF3ywukQYA Comedy  14227802    9sHzdsFiBbFc Comedy  13606292    10Mz9BlgiV45I Education   2807232 1VnGb6UQvwJs Education   2441432 2j6lADdhmAec Education   2306689 3pYfTQpsErsM Education   2066429 47GNE6AhL0qI Education   2037257 5NiGZf15W9xc Education   1867126 6-FeAK-q5Cok Education   1741504 7N5P6o2YaqVI Education   1656512 8Dl-agXA63nw Education   1529566 9JX3VmDgiFnY Education   1373008 10cQ25-glGRzI Music   77674728    17AVHXe-ol-s Music   60349673    2ePyRrb2-fzs Music   45984219    3xsRWpK4pf90 Music   44614530    4ktUSIJEiOug Music   43583367    5b3u65f4CRLk Music   43511791    6iWg3IMN_rhU Music   43323757    7innfyQZHPpo Music   41564032    8Lt6o8NlrbHg Music   41171303    9QjA5faZF1A8 Music   40294882    10qruSOZq-wJg Nonprofits  2673823 1tFxk7glmMbo Nonprofits  1063928 2dYN278GB2Kg Nonprofits  831836  3n-Akqik3hME Nonprofits  778987  4LtqRAYWjz2Q Nonprofits  768422  52v-whrgAbf4 Nonprofits  742172  6ja4d02s0WKQ Nonprofits  733014  7IpAOYskH1s8 Nonprofits  714580  8ieuVTOJyLBs Nonprofits  640994  96CXBgbP_ZKs Nonprofits  562163  10SkELRp4wKPs Politics    12730681    1AFFQrUyi8-s Politics    11953598    2YgW7or1TuFk Politics    8516433 3JgiGrXpOhYg Politics    8211043 4up5jmbSjWkw Politics    7869023 5hr23tpWX8lM Politics    7209885 6Kje7NUNebL8 Politics    6707868 7I4u3449L5VI Politics    6675798 8jjXyqcx-mYY Politics    6462051 9a9Vde3FHMmc Politics    6028642 108h98jb9Lk74 UNA 33880568    1AR5yq0aMxI8 UNA 18409445    2DH7FVB15EPU UNA 15487752    3Qvfx6UiAAG0 UNA 13977592    4n-FdoZdfpmE UNA 12241555    5MddPeH1DAvY UNA 12211192    6sI8Sus_KRpY UNA 10460047    714oqm9ywLIw UNA 10344366    8LytRWuhn4BM UNA 9899270 9iSByZARPJSU UNA 9316897 10Oro28yg7W74 Gaming  727015  10to1HDxtkM4 Gaming  529346  2M-w-gLVfi30 Gaming  396471  3i6aH2F1WjsY Gaming  382350  4EgbUSsblCSQ Gaming  266718  57SMnWr9MqwQ Gaming  210147  6NQMBIRipp5A Gaming  207783  7vi1lVqJSbsM Gaming  186709  8aId2hK4pI2M Gaming  186054  9tWPWcyLdrko Gaming  181979  10sLGLum5SyKQ Howto   31121122    1eMhGpzyFdhE Howto   16320501    2KPOOWvP_dd8 Howto   11131527    391wuBqlny50 Howto   8579613 4mr5ghuaTK14 Howto   8361812 5GfPJeDssBOM Howto   6101232 66gmP4nk0EOE Howto   4970382 7mM-30cmM33s Howto   4936417 8STQ3nhXuuEM Howto   4655542 9XZGgeGHU1Bs Howto   4424698 10LU8DDYz68kM Pets    27721690    1epUk3T2Kfno Pets    10352882    2z3U0udLH974 Pets    9461084 3kkT7A3jegBc Pets    9269896 4TZ860P4iTaM Pets    9009434 57tRWRSfcDuQ Pets    8538635 6Qit3ALTelOo Pets    7939352 7Zi9GOvR3Ynw Pets    7351184 8Kxa0mnDj0bs Pets    7289545 9PadauuWF94w Pets    6271287 10W1czBcnX1Ww Science 3234852 1D99NHb6B03s Science 3176792 2tk_F2Y-F2kE Science 3121903 3nhyH7lQ6D2k Science 2879861 4JCbKv9yiLiQ Science 2672391 5U5vs2ly_grk Science 2611389 6M0ODskdEPnQ Science 2555284 7p4ebtj1jR7c Science 2536109 88wTlureUMP8 Science 2477804 9NZNTgglPbUA Science 2230729 10vt4X7zFfv4k Sports  12598542    1P-bWsOK-h98 Sports  12101588    2OS5tQvQOB-Y Sports  9047732 3P9LmHXXWiJs Sports  7415813 4NIKdK-T-jZM Sports  7175403 5euMu1SKi-ak Sports  7040023 6zKQgTiqhPbw Sports  7017699 7yeXoxNP8_xY Sports  6422039 8q8t7iSGAKik Sports  6312667 9COcczatkNP4 Sports  6258611 10sLGLum5SyKQ Style   31121122    1eMhGpzyFdhE Style   16320501    2KPOOWvP_dd8 Style   11131527    391wuBqlny50 Style   8579613 4mr5ghuaTK14 Style   8361812 5GfPJeDssBOM Style   6101232 66gmP4nk0EOE Style   4970382 7mM-30cmM33s Style   4936417 8STQ3nhXuuEM Style   4655542 9XZGgeGHU1Bs Style   4424698 10ZeBd_F2Bz5Y Vehicles    8623041 1ju6t-yyoU8s Vehicles    7325889 2uUurALr_Ckk Vehicles    7258142 3WShY1ObPvhQ Vehicles    7047156 4q3idQKi5EqM Vehicles    6470816 59JWywFpZkg4 Vehicles    6465830 6tth9krDtxII Vehicles    6394563 7npTRXr4Sgxg Vehicles    5450317 8aCamHfJwSGU Vehicles    5354163 9S-ppGiwc0wQ Vehicles    5039415 10LU8DDYz68kM Animals 27721690    1epUk3T2Kfno Animals 10352882    2z3U0udLH974 Animals 9461084 3kkT7A3jegBc Animals 9269896 4TZ860P4iTaM Animals 9009434 57tRWRSfcDuQ Animals 8538635 6Qit3ALTelOo Animals 7939352 7Zi9GOvR3Ynw Animals 7351184 8Kxa0mnDj0bs Animals 7289545 9PadauuWF94w Animals 6271287 10ZeBd_F2Bz5Y Autos   8623041 1ju6t-yyoU8s Autos   7325889 2uUurALr_Ckk Autos   7258142 3WShY1ObPvhQ Autos   7047156 4q3idQKi5EqM Autos   6470816 59JWywFpZkg4 Autos   6465830 6tth9krDtxII Autos   6394563 7npTRXr4Sgxg Autos   5450317 8aCamHfJwSGU Autos   5354163 9S-ppGiwc0wQ Autos   5039415 10v3ARyAb_1Bs Blogs   31812447    15GE82tqcYYQ Blogs   30209692    2ervaMPt4Ha0 Blogs   23859297    3uWow42TCwzg Blogs   22389389    4-_CSo1gOd48 Blogs   21176701    5D2kJZOfq7zk Blogs   20458159    6nhSZs-aAZbo Blogs   20423158    7GuMMfgWhm3g Blogs   18634621    84jbkRGPxvaM Blogs   15980887    9hMnk7lh9M3o Blogs   13553069    10LpAI8TzQDes Entertainment   65078772    1244qR7SvvX0 Entertainment   57790943    21uwOL4rB-go Entertainment   39883413    3w2xUzv6iZWo Entertainment   25222946    4vr3x_RRJdd4 Entertainment   25093671    5lj3iNxZ8Dww Entertainment   23737579    6RB-wUgnyGv0 Entertainment   23067889    7lsO6D1rwrKc Entertainment   21758300    87iYWxfNSjYk Entertainment   19029677    95pGJCkCDK5A Entertainment   18858695    10p0aQvKDA1K0 Events  12239023    1bNF_P281Uu4 Events  9125026 2AlPqL7IUT6M Events  6441558 3J833f9fqWBA Events  4776832 4xIvIWJbzimo Events  4284770 5QuTj9a04o-s Events  4053317 6eejQPUyeNiY Events  3573812 73QL97xldoXc Events  3010296 8r43yCiKlbCo Events  2806995 9z42fchrzhHY Events  2565257 1012Z3J1uzd0Q Film    65341925    1bFytHZHFXhA Film    38937813    2sdUUx5FdySs Film    16151661    3vQbYvjmfbr4 Film    12114231    4rNKefRRvV7g Film    11884311    56HIavxnUHls Film    11735507    6gm5DHKI8o5o Film    9381674 7316BF17k5d8 Film    9372064 8ZuQMn6Z0k_w Film    9195588 96B26asyGKDo Film    8915176 10SkELRp4wKPs News    12730681    1AFFQrUyi8-s News    11953598    2YgW7or1TuFk News    8516433 3JgiGrXpOhYg News    8211043 4up5jmbSjWkw News    7869023 5hr23tpWX8lM News    7209885 6Kje7NUNebL8 News    6707868 7I4u3449L5VI News    6675798 8jjXyqcx-mYY News    6462051 9a9Vde3FHMmc News    6028642 10v3ARyAb_1Bs People  31812447    15GE82tqcYYQ People  30209692    2ervaMPt4Ha0 People  23859297    3uWow42TCwzg People  22389389    4-_CSo1gOd48 People  21176701    5D2kJZOfq7zk People  20458159    6nhSZs-aAZbo People  20423158    7GuMMfgWhm3g People  18634621    84jbkRGPxvaM People  15980887    9hMnk7lh9M3o People  13553069    10W1czBcnX1Ww Technology  3234852 1D99NHb6B03s Technology  3176792 2tk_F2Y-F2kE Technology  3121903 3nhyH7lQ6D2k Technology  2879861 4JCbKv9yiLiQ Technology  2672391 5U5vs2ly_grk Technology  2611389 6M0ODskdEPnQ Technology  2555284 7p4ebtj1jR7c Technology  2536109 88wTlureUMP8 Technology  2477804 9NZNTgglPbUA Technology  2230729 10p0aQvKDA1K0 Travel  12239023    1bNF_P281Uu4 Travel  9125026 2AlPqL7IUT6M Travel  6441558 3J833f9fqWBA Travel  4776832 4xIvIWJbzimo Travel  4284770 5QuTj9a04o-s Travel  4053317 6eejQPUyeNiY Travel  3573812 73QL97xldoXc Travel  3010296 8r43yCiKlbCo Travel  2806995 9z42fchrzhHY Travel  2565257 10Time taken: 121.381 seconds, Fetched: 250 row(s)

5 总结

上面的8个例子向大家展示Hive中简单和稍微复杂的操作,有的启动一个Job就可以完成,有的则需要启动多个Job才能完成,我们也可以看到,启动的Job越多运行时间就会越长,但是实际工作中的操作只会远比我们所演示要更加复杂,越是复杂的操作就更加需要去优化,来达到减少运行时间的目的,所以下一篇我们来看看Hive的优化实践

原创粉丝点击