hive表类型---桶表、分区表

来源：互联网发布：威少西部决赛数据编辑：程序博客网时间：2024/06/09 18:51

Hive表类型 ----

桶表、

桶表是对数据进行哈希取值，然后放到不同文件中存储。创建表create table t_bucket(id string) clustered by(id) into 3 buckets;加载数据set hive.enforce.bucketing = true;insert into table t_bucket select id from test;insert overwrite table t_bucket select id from test;   数据加载到桶表时，会对字段取hash值，然后与桶的数量取模。把数据放到对应的文件中。注意：物理上，每个桶就是表(或分区）目录里的一个文件一个作业产生的桶(输出文件)和reduce任务个数相同

分区表、

分区可以理解为分类，通过分类把不同类型的数据放到不同的目录下。分类的标准就是分区字段，可以一个，也可以多个。分区表的意义在于优化查询。查询时尽量利用分区字段。如果不使用分区字段，就会全部扫描。    创建：    create table t6_partition(    id int,    name string,    birthday date,    online boolean    ) partitioned by(dt date comment "partition field day time");        查看分区：    show partitions t6_partition;        增加分区：    alter table t6_partition add partition(dt="2017-07-20");        删除分区：    alter table t6_partition drop partition(dt="2017-07-20");
如果有多个统计维度的时候，可以采用多个分区来设置      create table t6_partition_1(      id int,      name string,      birthday date,      online boolean       ) partitioned by(year int, class string);

阅读全文

0 0