hive之分区表
来源:互联网 发布:淘宝二手购买 编辑:程序博客网 时间:2024/05/11 18:30
一、hive语法
1.1创建表(MANAGED_TABLE):
create table student(id bigint,name string) row format delimited fields terminated by '\t';
会在元数据库中生成一下信息:
TBLS表中信息:
COLUMNS_V2表中信息:
SDS表中信息:
1.2加载数据:
load data local inpath '/root/student.txt' into student;
1.3查询数据
hive> select * from student;
OK
1 zhangsan
2 wangwu
3 lisi
NULL NULL
Time taken: 0.318 seconds, Fetched: 4 row(s)
hive> select * from student limit 2;
OK
1 zhangsan
2 wangwu
Time taken: 0.087 seconds, Fetched: 2 row(s)
注:都不会执行MR。
hive> select sum(id) from student;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1494075968732_0002, Tracking URL = http://heres04:8088/proxy/application_1494075968732_0002/
Kill Command = /heres/hadoop-2.2.0/bin/hadoop job -kill job_1494075968732_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-05-06 21:26:09,717 Stage-1 map = 0%, reduce = 0%
2017-05-06 21:26:19,428 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.71 sec
2017-05-06 21:26:28,987 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.4 sec
MapReduce Total cumulative CPU time: 3 seconds 400 msec
Ended Job = job_1494075968732_0002
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.4 sec HDFS Read: 232 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 400 msec
OK
6
Time taken: 31.431 seconds, Fetched: 1 row(s)
注:执行MR程序
1.4外部表(EXTERNAL_TABLE)
建立外部表:先有数据(数据已经在hdfs中),然后我们创建一个表,让这个表指向这个目录。hive的客户端也可以执行hadoop的命令。
只要把文件放到这个目录下,就可以在hive客户端查询到(内部表也适用,不管是内部表还是外部表,只要把数据放到指定目录下,就可以查询到;但是分区表除外)。
hive> dfs -ls /
> ;
Found 11 items
-rw-r--r-- 3 root supergroup 59 2017-04-18 16:55 /a.txt
drwxr-xr-x - root supergroup 0 2017-04-30 15:15 /hbase
-rw-r--r-- 3 root supergroup 27605 2017-04-18 10:44 /install.log
-rw-r--r-- 3 root supergroup 27605 2017-04-18 15:47 /log
drwxr-xr-x - root supergroup 0 2017-04-23 17:31 /sqoop
drwx------ - root supergroup 0 2017-05-04 16:41 /tmp
drwxr-xr-x - root supergroup 0 2017-05-04 19:14 /user
drwxr-xr-x - root supergroup 0 2017-04-18 18:46 /wca
drwxr-xr-x - root supergroup 0 2017-04-18 18:53 /wcc
drwxr-xr-x - root supergroup 0 2017-04-18 18:09 /wcout
drwxr-xr-x - root supergroup 0 2017-04-18 19:53 /wcs
hive> dfs -mkdir /data;
hive> dfs -put /root/student.txt /data/a.txt;
hive> dfs -put /root/student.txt /data/b.txt;
hive> create external table ext_student(id int,name string) row format delimited fields terminated by '\t' location '/data' ;
OK
Time taken: 0.338 seconds
hive> select * from ext_student;
OK
1 zhangsan
2 wangwu
3 lisi
NULL NULL
1 zhangsan
2 wangwu
3 lisi
NULL NULL
二、建立分区表
hive>create table beauties (id bigint, name string,size int ) partitioned by (nation string) row format delimited fields terminated by '\t';
hive> load data local inpath '/root/b.c' into table beauties partition(nation='china');
Copying file: file:/root/b.c
Loading data to table default.beauties partition (nation=china)
Partition default.beauties{nation=china} stats: [numFiles=1, numRows=0, totalSize=45, rawDataSize=0]
OK
Time taken: 0.739 seconds
OK
1 glm 34.0 china
2 lina 30.9 china
3 liu 45.0 china
4 bing 56.56 china
Time taken: 0.139 seconds, Fetched: 4 row(s)
hive> dfs -mkdir /beauty/nation=Japan;
hive> dfs -put /root/b.j /beauty/nation=Japan;
hive> select * from beauties;
OK
1 glm 34.0 china
2 lina 30.9 china
3 liu 45.0 china
4 bing 56.56 china
Time taken: 0.089 seconds, Fetched: 4 row(s)
为什么没有查到?因为元数据库没有记录这个信息。缺少如下图红色框中的数据:
OK
Time taken: 0.185 seconds
hive> select * from beauties;
OK
1 bgyjy 56.6565 Japan
2 jzmb 23.232 Japan
3 ewrwe 43.9 Japan
1 glm 34.0 china
2 lina 30.9 china
3 liu 45.0 china
4 bing 56.56 china
Time taken: 0.104 seconds, Fetched: 7 row(s)
OK
1 bgyjy 56.6565 Japan
2 jzmb 23.232 Japan
3 ewrwe 43.9 Japan
Time taken: 0.253 seconds, Fetched: 3 row(s)
三、从mysql导入数据,利用hive进行统计分析
1、将hive添加到环境变量当中
vim /etc/profile
添加: :/heres/apache-hadoop.../bin
2、在hive当中创建两张表
create table trade_detail (id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t';
create table user_info (id bigint, account string, name string, age int) row format delimited fields terminated by '\t';
3、将mysq当中的数据直接导入到hive当中(先将数据导到hdfs中,然后hive再将数据load进来,所有要将hive添加到环境变量):首先进入sqoop目录中的bin目录中,执行以下:
sqoop import --connect jdbc:mysql://192.168.1.10:3306/itcast --username root --password 123 --table trade_detail --hive-import --hive-overwrite --hive-table trade_detail --fields-terminated-by '\t'
sqoop import --connect jdbc:mysql://192.168.1.10:3306/itcast --username root --password 123 --table user_info --hive-import --hive-overwrite --hive-table user_info --fields-terminated-by '\t'
4、创建一个result表保存前一个sql执行的结果
create table result row format delimited fields terminated by '\t' as select t2.account, t2.name, t1.income, t1.expenses, t1.surplus from user_info t2 join (select account, sum(income) as income, sum(expenses) as expenses, sum(income-expenses) as surplus from trade_detail group by account) t1 on (t1.account = t2.account);
四、语句小结
1、创建一个user表
create table user (id int, name string) row format delimited fields terminated by '\t';
2、将本地文件系统上的数据导入到HIVE当中
load data local inpath '/root/user.txt' into table user;
3、创建外部表
create external table stubak (id int, name string) row format delimited fields terminated by '\t' location '/stubak';
4、创建分区表
普通表和分区表区别:有大量数据增加的需要建分区表
create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '\t';
分区表加载数据
load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');
- hive之分区表
- hive之分区表详解
- Hive 之 分区表 外部分区表 关联查询
- Hive的数据模型之分区表
- hive 分区表
- Hive分区表
- Hive分区表
- Hive分区表
- hive 分区表
- hive分区表
- hive分区表
- Hive之管理表 外部表 分区表
- Hive 之 分区表,外部表,桶表
- Hive静态分区表&动态分区表
- Hive 笔记之 创建外部分区表并加载数据
- Hive运维记之——补分区表数
- Hive笔记三之内部表,外部表,分区表,桶表
- [Hive]Hive分区表新增字段
- Java反射详解
- QT布局和ui简单使用
- 解决xampp 外网访问不了
- js ajax
- 【PyQt】macOS PyQt5环境配置
- hive之分区表
- 机器学习-朴素贝叶斯分类
- 发牌程序
- vi常用命令
- 驱动调试之打印到proc虚拟文件3--实验
- Android设备内ASF(2)
- B-树的插入和遍历
- spirng项目搭建及spring依赖注入三种方式 温习总结
- DBSCAN基于密度的聚类算法