HIVE学习笔记:HIVE内置的三种表格式与表的划分

来源:互联网 发布:新浪微盾网络异常 编辑:程序博客网 时间:2024/05/16 15:49

textfile

create table test_txt(name string,val string) stored as textfile;desc formatted test_txt;

sequencefile

create table test_seq(name string,val string) stored as sequencefile;desc formatted test_seq;

rcfile

create table test_rc(name string,val string) stored as rcfile;desc formatted test_rc;

自建类

create table XXX(name string,val string) stored as inputformat 'XXX' outputformat 'XXX';

(待补充)

表的分区

首先新建一个分区表并导入数据:

create table if not exists employees(name string,salary float,subordinates array<string>,deductions map<string,float>,address struct<street:string,city:string,state:string,zip:int>)partitioned by (dt string,type string)row format delimitedfields terminated by '\t'collection items terminated by ','map keys terminated by ':'lines terminated by '\n'stored as textfile;load data local inpath '/home/daya/test/test.txt' overwrite into table employees;

增加分区:

alter table employees add if not exists partition(dt='20170906',type='test');

在HDFS中可以看到表在数据库中产生了如下的目录结构:

HIVE表的分区实际上就是在表的大小达到一定程度时,为了便于管理与查询,而把表中的数据按不同的子文件夹进行归类。
再添加一个分区并输出表的分区信息:

alter table employees add if not exists partition(dt='20170905',type='test');show partitions employees;

表的分桶

分桶是对表更细粒度的划分,提供更快的查询。
(待补充)

原创粉丝点击