Hive快速入门
来源:互联网 发布:java培训工资 编辑:程序博客网 时间:2024/05/23 01:26
Hive快速入门
@(博客文章)[hive]
(一)简单入门
1、创建一个表
create table if not exists ljh_emp(name string,salary float,gender string)comment 'basic information of a employee'row format delimited fields terminated by ',’;
2、准备数据文件
创建test目录且目录只有一个文件,文件内容如下:
ljh,25000,malejediael,25000,malellq,15000,female
3、将数据导入表中
load data local inpath '/home/ljhn1829/test' into table ljh_emp;
4、查询表中的内容
select * from ljh_emp;OKljh 25000.0 malejediael 25000.0 malellq 15000.0 femaleTime taken: 0.159 seconds, Fetched: 3 row(s)
(二)关于分隔符
1、默认分隔符
hive中的行默认分隔符为 \n,字段分隔符为 ctrl+A,此外还有ctrl+B,ctrl+C,可以用于分隔array,struct,map等,详见《hive编程指南》P44。
因此,若在建表是不指定row format delimited fields terminated by ‘,’,则认为默认字段分隔符为ctrl+A。
可以有2种解决方案:
一是在创建表时指定分隔符,如上例所示,
二是在数据文件中使用ctrl+A,见下例
2、在数据文件中使用ctrl+A全分隔符
(1)创建表
create table ljh_test_emp(name string, salary float, gender string);
(2)准备数据文件
创建test2目录,目录下只有一个文件,文件内容如下:
ljh^A25000^Amale
jediael^A25000^Amale
llq^A15000^Afemale
其中的^A字符仅在vi时才能看到,cat不能看到。
输出^A的方法是:在vi的插入模式下,先按ctrl+V,再按ctrl+A
(3)将数据导入表
create table ljh_test_emp(name string, salary float, gender string);
(4)查询数据
hive> select * from ljh_test_emp;OKljh 25000.0 malejediael 25000.0 malellq 15000.0 femaleTime taken: 0.2 seconds, Fetched: 3 row(s)
3、未指定分隔符,且又未使用ctrl+A作文件中的分隔符,出现以下错误
(1)创建表
create table if not exists ljh_emp_test(name string,salary float,gender string)comment 'basic information of a employee’;
(2)准备数据
ljh,25000,malejediael,25000,malellq,15000,female
(3)将数据导入表中
load data local inpath '/home/ljhn1829/test' into table ljh_emp_test;
(4)查看表中数据
select * from ljh_emp_test;OKljh,25000,male NULL NULLjediael,25000,male NULL NULLllq,15000,female NULL NULLTime taken: 0.185 seconds, Fetched: 3 row(s)
可以看出,由于分隔符为ctrl+A,因此导入数据时将文件中的每一行内容均只当作第一个字段,导致后面2个字段均为null。
(三)复杂一点的表
1、创建表
create table employees ( name string, slalary float, suboddinates array<string>, deductions map<string,float>, address struct<stree:string, city:string, state:string, zip:int>)partitioned by(country string, state string);
2、准备数据
John Doe^A100001.1^AMary Smith^BTodd Jones^AFederal Taxes^C.2^BStateTaxes^C.05^BInsurance^C.1^A1 Michigan Ave.^BChicago^BIL^B60600Mary Smith^A80000.0^ABill King^AFederal Taxes^C.2^BState Taxes^C.05^BInsurance^C.1^A100 Ontario St.^BChicago^BIL^B60601Todd Jones^A70000.0^A^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A200 Chicago Ave.^BOak Park^BIL^B60700Bill King^A60001.0^A^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A300 Obscure Dr.^BObscuria^BIL^B60100
注意 ^A:分隔字段 ^B:分隔array/struct/map中的元素 ^C:分隔map中的KV
详见《hive编程指南》P44。
3、将数据导入表中
load data local inpath '/home/ljhn1829/phd' into table employees partition(country='us',state='ca');
4、查看表数据
hive> select * from employees;OKJohn Doe 100001.1 ["Mary Smith","Todd Jones"] {"Federal Taxes":0.2,"StateTaxes":0.05,"Insurance":0.1} {"stree":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600} us caMary Smith 80000.0 ["Bill King"] {"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1} {"stree":"100 Ontario St.","city":"Chicago","state":"IL","zip":60601} us caTodd Jones 70000.0 [] {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1} {"stree":"200 Chicago Ave.","city":"Oak Park","state":"IL","zip":60700} us caBill King 60001.0 [] {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1} {"stree":"300 Obscure Dr.","city":"Obscuria","state":"IL","zip":60100} us caTime taken: 0.312 seconds, Fetched: 4 row(s)
5、查看hdfs中的文件
hadoop fs -ls /data/gamein/g4_us/meta/employees/country=us/state=caFound 1 items-rwxr-x--- 3 ljhn1829 g4_us 428 2015-05-12 12:49 /data/gamein/g4_us/meta/employees/country=us/state=ca/progamming_hive_data.txt
该文件中的内容与原有文件一致。
(四)通过select子句插入数据
1、创建表
create table employees2 ( name string, slalary float, suboddinates array<string>, deductions map<string,float>, address struct<stree:string, city:string, state:string, zip:int>)partitioned by(country string, state string);
2、插入数据
hive> set hive.exec.dynamic.partition.mode=nonstrict;
否则会出现以下异常:
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrictinsert into table employees2partition (country,state)select name,slalary,suboddinates,deductions,address, e.country, e.statefrom employees e;
- hive快速入门
- HIVE快速入门
- HIVE快速入门
- HIVE快速入门
- Hive快速入门
- Hadoop-Hive快速入门
- Hive快速入门
- Hive 快速入门
- Hive快速入门
- Hive数据仓库之快速入门
- Hive 和 HBase 的快速入门
- HDFS+MapReduce+Hive+HBase十分钟快速入门(转)
- [转]HDFS+MapReduce+Hive+HBase十分钟快速入门
- HDFS+MapReduce+Hive+HBase十分钟快速入门
- HDFS+MapReduce+Hive+HBase十分钟快速入门
- HDFS+MapReduce+Hive+HBase 十分钟快速入门
- HDFS+MapReduce+Hive+HBase十分钟快速入门(转)
- HDFS+MapReduce+Hive+HBase十分钟快速入门
- centos 利用crontab定时任务设计
- 《兄弟》读后感
- iOS微信支付流程及实现
- Storm-HDFS整合过程中问题解决
- HDU 3127 (DP)
- Hive快速入门
- Could not find the python development headers
- APP气泡聊天框
- js获取url中的某个参数
- 简单实现单选多选(GrideView实现单选,ListView实现多选),横向ListView
- JS开发HTML5游戏《神奇的六边形》(二)
- Unity 实现Log实时输出到屏幕或控制台上<一>
- Android ListView工作原理解析
- 好消息!微软将在中国创办官方的Dynamics CRM社区!!!