Hive Example
来源:互联网 发布:win7镜像 知乎 编辑:程序博客网 时间:2024/05/06 15:07
Hive Example
此实例主要学习Hive的基本操作
准备数据
- 下载数据
wget http://files.grouplens.org/datasets/movielens/ml-100k.zip - 解压数据
unzip ml-100k.zip - 数据说明(具体请查看解压目录中README)
u.data -- The full u data set, 100000 ratings by 943 users on 1682 items. Each user has rated at least 20 movies. Users and items are numbered consecutively from 1. The data is randomly ordered. This is a tab separated list of user id | item id | rating | timestamp. The time stamps are unix seconds since 1/1/1970 UTCu.info -- The number of users, items, and ratings in the u data set.u.item -- Information about the items (movies); this is a tab separated list of movie id | movie title | release date | video release date | IMDb URL | unknown | Action | Adventure | Animation | Children's | Comedy | Crime | Documentary | Drama | Fantasy | Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi | Thriller | War | Western | The last 19 fields are the genres, a 1 indicates the movie is of that genre, a 0 indicates it is not; movies can be in several genres at once. The movie ids are the ones used in the u.data data set.u.genre -- A list of the genres.u.user -- Demographic information about the users; this is a tab separated list of user id | age | gender | occupation | zip code The user ids are the ones used in the u.data data set.u.occupation -- A list of the occupations.
创建表
接下来使用上述数据中的其中三个数据做接下来的练习,创建文件 hiveExample.sql,hiveExample.sql内容如下:
CREATE TABLE IF NOT EXISTS u_data ( userid INT, movieid INT, rating INT, unixtime STRING)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\t'STORED AS TEXTFILE;CREATE TABLE IF NOT EXISTS u_user( userid INT, age INT, gender STRING, occupation STRING , zipcode INT)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '|'STORED AS TEXTFILE;CREATE TABLE IF NOT EXISTS u_item( movieid INT, movietitle STRING, releasedate STRING, videoreleasedate STRING, IMDbURL STRING, unknown INT, Action INT, Adventure INT, Animation INT, Childrens INT, Comedy INT, Crime INT, Documentary INT, Drama INT, Fantasy INT, FilmNoir INT, Horror INT, Musical INT, Mystery INT, Romance INT, SciFi INT, Thriller INT, War INT, Western INT)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '|'STORED AS TEXTFILE;
执行如下命令,创建表: hive -f hiveExample.sql
导入数据:
创建文件 importData.sql,importData.sql内容如下:
LOAD DATA LOCAL INPATH '/home/dev/storeFile/ml-100k/u.data' OVERWRITE INTO TABLE u_data;LOAD DATA LOCAL INPATH '/home/dev/storeFile/ml-100k/u.user' OVERWRITE INTO TABLE u_user;LOAD DATA LOCAL INPATH '/home/dev/storeFile/ml-100k/u.item' OVERWRITE INTO TABLE u_item;
其中/home/dev/storeFile/ml-100k/u.data为本地文件路径。
执行如下命令,导入数据: hive -f importData.sql
导入成功后打印如下信息:
Loading data to table default.u_dataOKTime taken: 3.852 secondsLoading data to table default.u_userOKTime taken: 0.563 secondsLoading data to table default.u_itemOKTime taken: 0.706 seconds
HiveQL查询实例
select / order by /limit
将 u_user表中用户按照age降序排列,并查询出前5位用户信息:
hive> select * from u_user u order by u.age desc limit 5 ;
结果:
481 73 M retired 37771860 70 F retired 48322767 70 M engineer 0803 70 M administrator 78212585 69 M librarian 98501
group by
- group by 按照一个或者多个列对结果进行分组。
- count / distinct 一同使用 DISTINCT 和 COUNT 关键词,来计算非重复结果的数目
统计u_user表中不同性别的人数。
select u.gender, count(distinct u.userid) from u_user u group by u.gender;
结果:
OKF 273M 670Time taken: 19.786 seconds, Fetched: 2 row(s)
参考资料
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
0 0
- Hive Example
- Hive UDF /UDAF /UDTF Example
- example
- example
- Example
- example
- Example
- Example
- Example
- Example
- example
- Example
- Example
- Hive
- HIVE
- Hive
- hive
- hive
- 【编程珠玑】第十五章 字符串
- codeforces 55D Beautiful numbers(数位DP)
- Windows GDI编程之设备环境
- netty 第二章 从client 传递Object 对象到server
- 机器学习笔记1——深度学习的数据库和目前一些流行的软件
- Hive Example
- Qt 剪贴板操作
- poj 3368
- Android游戏——贪吃蛇开发实录(改进后的源码和详解)
- 【C/C++】命令行参数
- c语言学习----数据类型
- POJ 3411 DFS
- tornodo使用(一)
- Xutils和Xutils3的基本使用(转)