Hive Example

来源:互联网 发布:win7镜像 知乎 编辑:程序博客网 时间:2024/05/06 15:07

Hive Example

此实例主要学习Hive的基本操作

准备数据

  • 下载数据

    wget http://files.grouplens.org/datasets/movielens/ml-100k.zip
  • 解压数据

    unzip ml-100k.zip
  • 数据说明(具体请查看解压目录中README)
u.data     -- The full u data set, 100000 ratings by 943 users on 1682 items.              Each user has rated at least 20 movies.  Users and items are              numbered consecutively from 1.  The data is randomly              ordered. This is a tab separated list of                 user id | item id | rating | timestamp.              The time stamps are unix seconds since 1/1/1970 UTCu.info     -- The number of users, items, and ratings in the u data set.u.item     -- Information about the items (movies); this is a tab separated              list of              movie id | movie title | release date | video release date |              IMDb URL | unknown | Action | Adventure | Animation |              Children's | Comedy | Crime | Documentary | Drama | Fantasy |              Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |              Thriller | War | Western |              The last 19 fields are the genres, a 1 indicates the movie              is of that genre, a 0 indicates it is not; movies can be in              several genres at once.              The movie ids are the ones used in the u.data data set.u.genre    -- A list of the genres.u.user     -- Demographic information about the users; this is a tab              separated list of              user id | age | gender | occupation | zip code              The user ids are the ones used in the u.data data set.u.occupation -- A list of the occupations.

创建表

接下来使用上述数据中的其中三个数据做接下来的练习,创建文件 hiveExample.sql,hiveExample.sql内容如下:

CREATE TABLE IF NOT EXISTS u_data (        userid INT,        movieid INT,        rating INT,        unixtime STRING)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\t'STORED AS TEXTFILE;CREATE TABLE IF NOT EXISTS u_user(        userid INT,        age  INT,        gender STRING,        occupation STRING ,        zipcode INT)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '|'STORED AS TEXTFILE;CREATE TABLE IF NOT EXISTS u_item(        movieid INT,        movietitle STRING,        releasedate  STRING,        videoreleasedate STRING,        IMDbURL STRING,        unknown  INT,        Action INT,        Adventure  INT,        Animation INT,        Childrens INT,        Comedy INT,        Crime INT,        Documentary INT,        Drama INT,        Fantasy INT,        FilmNoir INT,        Horror INT,        Musical INT,        Mystery INT,        Romance INT,        SciFi INT,        Thriller INT,        War INT,        Western INT)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '|'STORED AS TEXTFILE;

执行如下命令,创建表:
hive -f hiveExample.sql

导入数据:

创建文件 importData.sql,importData.sql内容如下:

LOAD DATA LOCAL INPATH '/home/dev/storeFile/ml-100k/u.data' OVERWRITE INTO TABLE u_data;LOAD DATA LOCAL INPATH '/home/dev/storeFile/ml-100k/u.user' OVERWRITE INTO TABLE u_user;LOAD DATA LOCAL INPATH '/home/dev/storeFile/ml-100k/u.item' OVERWRITE INTO TABLE u_item;

其中/home/dev/storeFile/ml-100k/u.data为本地文件路径。
执行如下命令,导入数据:
hive -f importData.sql
导入成功后打印如下信息:

Loading data to table default.u_dataOKTime taken: 3.852 secondsLoading data to table default.u_userOKTime taken: 0.563 secondsLoading data to table default.u_itemOKTime taken: 0.706 seconds

HiveQL查询实例

select / order by /limit

将 u_user表中用户按照age降序排列,并查询出前5位用户信息:

hive> select * from u_user u order by u.age desc limit 5 ;

结果:

481 73  M   retired 37771860 70  F   retired 48322767 70  M   engineer    0803 70  M   administrator   78212585 69  M   librarian   98501

group by

  • group by 按照一个或者多个列对结果进行分组。
  • count / distinct 一同使用 DISTINCT 和 COUNT 关键词,来计算非重复结果的数目
    统计u_user表中不同性别的人数。
select u.gender, count(distinct u.userid) from u_user u group by u.gender;

结果:

OKF   273M   670Time taken: 19.786 seconds, Fetched: 2 row(s)

参考资料

https://cwiki.apache.org/confluence/display/Hive/GettingStarted

0 0
原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 华为麦芒5主屏页面不显示怎么办 6s p换屏幕原装太贵怎么办 4g手机开不开机黑屏怎么办 华为麦芒5 4g信号差怎么办 华为麦芒手机锁屏密码忘了怎么办 华为麦芒5相机拍相片倒了怎么办 红米5a开不了机怎么办 华为沾了海水打不开机怎么办 华为麦芒手机忘记锁屏密码怎么办 华为手机的方框键摁不了怎么办 笔记本自动更新到一半太慢了怎么办 华为麦芒5音量下键乱跑了怎么办 麦芒6手机QQ视频没声音怎么办 18:9看16:9黑边怎么办 华为畅享7s声音小怎么办 华为畅享8手机声音小怎么办 华为畅享8plus声音小怎么办 荣耀7x锁屏密码忘记怎么办 华为荣耀7x锁屏密码忘记了怎么办 苹果耳机进水后声音变了怎么办 华为荣耀开了数据用不了怎么办 华为手机高德地图信号弱怎么办? 手机QQ浏览器看视频有广告怎么办 手机显示网络连接但不可上网怎么办 华为手机关机后开不了机怎么办 华为畅享8p相机拍照模糊怎么办 手机触屏不准怎么办荣耀青春版九 华为手机锁屏手势密码忘了怎么办 荣耀手机锁屏密码忘了怎么办 华为p20隐私空间密码忘了怎么办 安全管家隐私保护的密码忘了怎么办 华为手机自带截图键删除了怎么办 飞科电吹风吹一会就断电怎么办 住酒店时电吹风吹坏了怎么办 把话费充到停机的号码上去了怎么办 电信手机卡充值了还停机怎么办 电信手机一直没用却欠费了怎么办 苹果se开起4g信号不好怎么办 触屏华为手机充不了电怎么办 华为手机自拍出来的字反向怎么办 华为微信隐藏了怎么弄出来怎么办