hive系列（一）

来源：互联网发布：java与xml数据绑定编辑：程序博客网时间：2024/06/06 00:00

1 hive简介

Hive是一个构建在Hadoop上的数据仓库工具，目标是SQL与MR的映射。hive十分适合数据仓库的统计分析。

Hive在hadoop生态系统中扮演数据仓库的角色，将结构化的数据文件映射成一张数据库中的表；

2 hive与HBase的关系

hive和HBase都是构建在hadoop之上的技术；HBase是一个key/value的非关系型数据库，运行在HDFS之上；Hive是运行在HDFS之上的数据仓库；

Hive适用于对一段时间内数据进行离线处理分析，运行比较慢，不适合实时查询；

HBase适用大数据库的实时查询；

Hive、HBase、HDFS三者数据可以相互迁移；

3 创建数据库

 hive> create database hive;

hive> use hive;4 创建内部表  hive> create table emp(    > empno int,    > empname string,    > job string,    > mgr int,    > hiredate string,    > salary double,    > comm string,    > deptno int )    > row format delimited     > fields terminated by " ";    [root@localhost hive]# cat input.txt      hive> load data local inpath '/usr/local/hive/input.txt' overwrite into table emp;   hive> select * from emp;    此时会在input.txt同级目录下产生一个emp.java，该文件是sql转换成mapreduce文件5 创建外部表采用pig客户端创建目录grunt> mkdir /hivehive> create external table emp_ext(    > empno int,    > empname string,    > job string,    > mgr int,    > hiredate string,    > salary double,    > comm string,    > deptno int )    > row format delimited    > fields terminated by " "    > location '/hive';hive> select count(1) from emp_ext;0说明没有数据利用pig向/hive中添加一个文件grunt> copyFromLocal /usr/local/input.txt /hive/input.txthive> select * from emp_ext;6 创建分区表   hive> create table emp_part(    > empno int,    > empname string,    > job string,    > mgr int,    > hiredate string,    > salary double,    > comm string,    > deptno int)    > partitioned by (year string,month string)    > row format delimited     > fields terminated by " ";hive> load data local inpath '/usr/local/hive/input.txt' into table emp_part partition (year='2016',month='10');7 其它创建表的方式hive> create table emp_part like default.emp_part;hive> create table emp_part as default.emp_part;

0 0