怎样将文本文件导入impala中的分区表中

来源:互联网 发布:淘宝规蜜网址 编辑:程序博客网 时间:2024/06/02 04:36

1、在impala中建立无分区的表,例如gxzl_kgx_drw_NP

create table if not exists gxzl_kgx_drw_NP (mat_track_no string,materialcode string,id double,defectid double,mainno string,unitno string,side string,x double,y double,defectclass string,defectclasscode string,imagefile string,mat_act_width double,mat_act_len double,prod_end_time_zd string,reccreatetime string,equipmentcode string,num double,seq double,len_sum bigint,len_tot bigint,x_sum bigint,y_sum bigint,z_sum bigint,x_drw double,y_drw double,z_drw double) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

2、在impala中建立需要的有分区的表,例如gxzl_kgx_drw

create table if not exists gxzl_kgx_drw (materialcode string,id double,defectid double,mainno string,unitno string,side string,x double,y double,defectclass string,defectclasscode string,imagefile string,mat_act_width double,mat_act_len double,prod_end_time_zd string,reccreatetime string,equipmentcode string,num double,seq double,len_sum bigint,len_tot bigint,x_sum bigint,y_sum bigint,z_sum bigint,x_drw double,y_drw double,z_drw double) **partitioned by (mat_track_no string)** ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

3、将文本文件插入到无分区表中

load data inpath '/user/gxzl_kgx_drw.txt' into table gxzl_kgx_drw_NP;

注:impala的load只能使用hdfs文件路径,如果你的数据放在本地上,要先上传到hdfs中。

4、利用insert into…select向分区表中插入数据

insert into table gxzl_kgx_drw PARTITION(mat_track_no) select materialcode,id,defectid,mainno,unitno,side,x,y,defectclass,defectclasscode,imagefile,mat_act_width,mat_act_len,prod_end_time_zd,reccreatetime,equipmentcode,num,seq,len_sum,len_tot,x_sum,y_sum,z_sum,x_drw,y_drw,z_drw,**mat_track_no** from gxzl_kgx_drw_NP; 

注:

对于动态分区,即未给PARTITION(mat_track_no)赋默认值,则定义的分区字段必须在 SELECT *所返回列的最后,这样就会自动按列表的最后字段做分区;

0 0