分区表,管理表

来源:互联网 发布:赵薇阿里影业 知乎 编辑:程序博客网 时间:2024/06/16 15:41

创建分区表:

create table if not exists china_partition(ProvinceID int,ProvinceName string,CityID int,CityName string,ZipCode int,DistrictID int,DistrictName string)partitioned by ( Province string,City string )row format delimited fields terminated by ',';
注意:分区字段名和数据字段名不能相同,不然报错如下:

hive> create table if not exists china_partition(    > ProvinceID int,    > ProvinceName string,    > CityID int,    > CityName string,    > ZipCode int,    > DistrictID int,    > DistrictName string)    > partitioned by ( ProvinceName string,CityName string )    > row format delimited fields terminated by ','    > ;FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
分区表加载数据:

load data local inpath '/home/hadoop/china_data/beijing.txt' into table china_partition partition ( Province='beijing',city='beijing');

hdfs文件系统目录结构:


也可以使用 show partitions 查看分区:

hive> show partitions china_partition;
使用 hive.mapred.mode(可选值为 strict,nonstrict)

如果 hive.mapred.mode 设置为 strict,查询分区表示 Hql 语句必须有 where 子句,不然会报错:

hive> select * from china_partition;FAILED: SemanticException Queries against partitioned tables without a partition filter are disabled for safety reasons. If you know what you are doing, please make sure that hive.strict.checks.large.query is set to false and that hive.mapred.mode is not set to 'strict' to enable them. No partition predicate for Alias "china_partition" Table "china_partition"

如果 hive.mapred.mode 设置为 nonstrict,查询分区表可以不带where子句:

hive> set hive.mapred.mode=nonstrict;hive> select * from china_partition;OKchina_partition.provinceidchina_partition.provincenamechina_partition.cityidchina_partition.citynamechina_partition.zipcodechina_partition.districtidchina_partition.districtnamechina_partition.provincechina_partition.city1北京市1北京市1000001东城区beijingbeijing1北京市1北京市1000002西城区beijingbeijing1北京市1北京市1000003崇文区beijingbeijing1北京市1北京市1000004宣武区beijingbeijing1北京市1北京市1000005朝阳区beijingbeijing1北京市1北京市1000006丰台区beijingbeijing1北京市1北京市1000007石景山区beijingbeijing1北京市1北京市1000008海淀区beijingbeijing1北京市1北京市1000009门头沟区beijingbeijing1北京市1北京市10000010房山区beijingbeijing1北京市1北京市10000011通州区beijingbeijing1北京市1北京市10000012顺义区beijingbeijing1北京市1北京市10000013昌平区beijingbeijing1北京市1北京市10000014大兴区beijingbeijing1北京市1北京市10000015怀柔区beijingbeijing1北京市1北京市10000016平谷区beijingbeijing1北京市1北京市10000017密云县beijingbeijing1北京市1北京市10000018延庆县beijingbeijingTime taken: 0.125 seconds, Fetched: 18 row(s)

如果分区特别多,用户执行查询部分分区,也是使用:

hive> show partitions china_partition partition (province='beijing');OKpartitionprovince=beijing/city=beijingTime taken: 0.18 seconds, Fetched: 1 row(s)
使用 describe formatted table_name 也可以显示分区信息:
hive> describe formatted china_partition;OKcol_namedata_typecomment# col_name            data_type           comment               provinceid          int                                     provincename        string                                  cityid              int                                     cityname            string                                  zipcode             int                                     districtid          int                                     districtname        string                                    # Partition Information  # col_name            data_type           comment               province            string                                  city                string                                    # Detailed Table Information  Database:           default              Owner:              hadoop               CreateTime:         Tue Apr 25 16:05:55 CST 2017 LastAccessTime:     UNKNOWN              Retention:          0                    Location:           hdfs://localhost:9000/user/hive/warehouse/china_partition Table Type:         MANAGED_TABLE        Table Parameters:  transient_lastDdlTime1493107555            # Storage Information  SerDe Library:      org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:        org.apache.hadoop.mapred.TextInputFormat OutputFormat:       org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed:         No                   Num Buckets:        -1                   Bucket Columns:     []                   Sort Columns:       []                   Storage Desc Params:  field.delim         ,                   serialization.format,                   Time taken: 0.065 seconds, Fetched: 38 row(s)hive> 
hive分区表还有很多策略例如 archive,touce,enable no_drop,enable offline 等。

0 0