Hive快速入门

来源：互联网发布：java培训工资编辑：程序博客网时间：2024/05/23 01:26

Hive快速入门

@(博客文章)[hive]

（一）简单入门

1、创建一个表

create table if not exists ljh_emp(name string,salary float,gender string)comment 'basic information of a employee'row format delimited fields terminated by ',’;

2、准备数据文件
创建test目录且目录只有一个文件，文件内容如下：

ljh,25000,malejediael,25000,malellq,15000,female

3、将数据导入表中

load data local inpath '/home/ljhn1829/test' into table ljh_emp;

4、查询表中的内容

select * from ljh_emp;OKljh    25000.0    malejediael    25000.0    malellq    15000.0    femaleTime taken: 0.159 seconds, Fetched: 3 row(s)

（二）关于分隔符

1、默认分隔符
hive中的行默认分隔符为 \n，字段分隔符为 ctrl+A，此外还有ctrl+B，ctrl+C，可以用于分隔array,struct,map等，详见《hive编程指南》P44。
因此，若在建表是不指定row format delimited fields terminated by ‘,’，则认为默认字段分隔符为ctrl+A。
可以有2种解决方案：
一是在创建表时指定分隔符，如上例所示，
二是在数据文件中使用ctrl+A，见下例

2、在数据文件中使用ctrl+A全分隔符
（1）创建表
create table ljh_test_emp(name string, salary float, gender string);
（2）准备数据文件
创建test2目录，目录下只有一个文件，文件内容如下：
ljh^A25000^Amale
jediael^A25000^Amale
llq^A15000^Afemale
其中的^A字符仅在vi时才能看到，cat不能看到。
输出^A的方法是：在vi的插入模式下，先按ctrl+V，再按ctrl+A
（3）将数据导入表
create table ljh_test_emp(name string, salary float, gender string);
（4）查询数据

hive> select * from ljh_test_emp;OKljh    25000.0    malejediael    25000.0    malellq    15000.0    femaleTime taken: 0.2 seconds, Fetched: 3 row(s)

3、未指定分隔符，且又未使用ctrl+A作文件中的分隔符，出现以下错误
(1)创建表

create table if not exists ljh_emp_test(name string,salary float,gender string)comment 'basic information of a employee’;

（2）准备数据

ljh,25000,malejediael,25000,malellq,15000,female

（3）将数据导入表中

load data local inpath '/home/ljhn1829/test' into table ljh_emp_test;

（4）查看表中数据

select * from ljh_emp_test;OKljh,25000,male    NULL    NULLjediael,25000,male    NULL    NULLllq,15000,female    NULL    NULLTime taken: 0.185 seconds, Fetched: 3 row(s)

可以看出，由于分隔符为ctrl+A，因此导入数据时将文件中的每一行内容均只当作第一个字段，导致后面2个字段均为null。

（三）复杂一点的表
1、创建表

create table employees (    name string,    slalary float,    suboddinates array<string>,    deductions map<string,float>,    address struct<stree:string, city:string, state:string, zip:int>)partitioned by(country string, state string);

2、准备数据

John Doe^A100001.1^AMary Smith^BTodd Jones^AFederal Taxes^C.2^BStateTaxes^C.05^BInsurance^C.1^A1 Michigan Ave.^BChicago^BIL^B60600Mary Smith^A80000.0^ABill King^AFederal Taxes^C.2^BState Taxes^C.05^BInsurance^C.1^A100 Ontario St.^BChicago^BIL^B60601Todd Jones^A70000.0^A^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A200 Chicago Ave.^BOak Park^BIL^B60700Bill King^A60001.0^A^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A300 Obscure Dr.^BObscuria^BIL^B60100

注意 ^A：分隔字段 ^B：分隔array/struct/map中的元素 ^C：分隔map中的KV
详见《hive编程指南》P44。

3、将数据导入表中

load data local inpath '/home/ljhn1829/phd' into table employees partition(country='us',state='ca');

4、查看表数据

hive> select * from employees;OKJohn Doe    100001.1    ["Mary Smith","Todd Jones"]    {"Federal Taxes":0.2,"StateTaxes":0.05,"Insurance":0.1}    {"stree":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}    us    caMary Smith    80000.0    ["Bill King"]    {"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1}    {"stree":"100 Ontario St.","city":"Chicago","state":"IL","zip":60601}    us    caTodd Jones    70000.0    []    {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}    {"stree":"200 Chicago Ave.","city":"Oak Park","state":"IL","zip":60700}    us    caBill King    60001.0    []    {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}    {"stree":"300 Obscure Dr.","city":"Obscuria","state":"IL","zip":60100}    us    caTime taken: 0.312 seconds, Fetched: 4 row(s)

5、查看hdfs中的文件

hadoop fs -ls /data/gamein/g4_us/meta/employees/country=us/state=caFound 1 items-rwxr-x---   3 ljhn1829 g4_us        428 2015-05-12 12:49 /data/gamein/g4_us/meta/employees/country=us/state=ca/progamming_hive_data.txt

该文件中的内容与原有文件一致。

（四）通过select子句插入数据
1、创建表

create table employees2 (    name string,    slalary float,    suboddinates array<string>,    deductions map<string,float>,    address struct<stree:string, city:string, state:string, zip:int>)partitioned by(country string, state string);

2、插入数据

hive>  set hive.exec.dynamic.partition.mode=nonstrict;

否则会出现以下异常：

    FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrictinsert into table employees2partition (country,state)select name,slalary,suboddinates,deductions,address, e.country, e.statefrom employees e;

0 0