分区表,管理表
来源:互联网 发布:赵薇阿里影业 知乎 编辑:程序博客网 时间:2024/06/16 15:41
创建分区表:
create table if not exists china_partition(ProvinceID int,ProvinceName string,CityID int,CityName string,ZipCode int,DistrictID int,DistrictName string)partitioned by ( Province string,City string )row format delimited fields terminated by ',';注意:分区字段名和数据字段名不能相同,不然报错如下:
hive> create table if not exists china_partition( > ProvinceID int, > ProvinceName string, > CityID int, > CityName string, > ZipCode int, > DistrictID int, > DistrictName string) > partitioned by ( ProvinceName string,CityName string ) > row format delimited fields terminated by ',' > ;FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
分区表加载数据:load data local inpath '/home/hadoop/china_data/beijing.txt' into table china_partition partition ( Province='beijing',city='beijing');
hdfs文件系统目录结构:
也可以使用 show partitions 查看分区:
hive> show partitions china_partition;使用 hive.mapred.mode(可选值为 strict,nonstrict)
如果 hive.mapred.mode 设置为 strict,查询分区表示 Hql 语句必须有 where 子句,不然会报错:
hive> select * from china_partition;FAILED: SemanticException Queries against partitioned tables without a partition filter are disabled for safety reasons. If you know what you are doing, please make sure that hive.strict.checks.large.query is set to false and that hive.mapred.mode is not set to 'strict' to enable them. No partition predicate for Alias "china_partition" Table "china_partition"
如果 hive.mapred.mode 设置为 nonstrict,查询分区表可以不带where子句:
hive> set hive.mapred.mode=nonstrict;hive> select * from china_partition;OKchina_partition.provinceidchina_partition.provincenamechina_partition.cityidchina_partition.citynamechina_partition.zipcodechina_partition.districtidchina_partition.districtnamechina_partition.provincechina_partition.city1北京市1北京市1000001东城区beijingbeijing1北京市1北京市1000002西城区beijingbeijing1北京市1北京市1000003崇文区beijingbeijing1北京市1北京市1000004宣武区beijingbeijing1北京市1北京市1000005朝阳区beijingbeijing1北京市1北京市1000006丰台区beijingbeijing1北京市1北京市1000007石景山区beijingbeijing1北京市1北京市1000008海淀区beijingbeijing1北京市1北京市1000009门头沟区beijingbeijing1北京市1北京市10000010房山区beijingbeijing1北京市1北京市10000011通州区beijingbeijing1北京市1北京市10000012顺义区beijingbeijing1北京市1北京市10000013昌平区beijingbeijing1北京市1北京市10000014大兴区beijingbeijing1北京市1北京市10000015怀柔区beijingbeijing1北京市1北京市10000016平谷区beijingbeijing1北京市1北京市10000017密云县beijingbeijing1北京市1北京市10000018延庆县beijingbeijingTime taken: 0.125 seconds, Fetched: 18 row(s)
如果分区特别多,用户执行查询部分分区,也是使用:
hive> show partitions china_partition partition (province='beijing');OKpartitionprovince=beijing/city=beijingTime taken: 0.18 seconds, Fetched: 1 row(s)使用 describe formatted table_name 也可以显示分区信息:
hive> describe formatted china_partition;OKcol_namedata_typecomment# col_name data_type comment provinceid int provincename string cityid int cityname string zipcode int districtid int districtname string # Partition Information # col_name data_type comment province string city string # Detailed Table Information Database: default Owner: hadoop CreateTime: Tue Apr 25 16:05:55 CST 2017 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://localhost:9000/user/hive/warehouse/china_partition Table Type: MANAGED_TABLE Table Parameters: transient_lastDdlTime1493107555 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim , serialization.format, Time taken: 0.065 seconds, Fetched: 38 row(s)hive>hive分区表还有很多策略例如 archive,touce,enable no_drop,enable offline 等。
0 0
- 分区表,管理表
- 管理分区表
- Hive之管理表 外部表 分区表
- oracle 分区表管理
- 【对象管理】分区表
- 分区表自动管理
- oracle管理分区表
- 管理分区表与索引
- mysql分区表管理
- greenplum 表管理及查看分区表占空间大小
- Oracle中分区表的管理
- 分区表的使用和管理
- MySQL分区表的管理~1
- MySQL分区表的管理~2
- 分区表与索引的管理
- SQL SERVER 分区表的总结--分区表的维护和管理
- Sql分区表的管理:添加,修改,查询分区表中的数据
- 管理分区表+为范围分区表增加分区+为散列分区表增加分区+为列表分区表增加分区+查看当前用户的表和表分区信息+合并分区+删除分区
- python循环引用的解决办法
- 我的第一篇博客
- 对阿里离线赛的数据用python进行简单的预处理
- mysql 开通远程访问权限
- 【IMWeb训练营作业】之小组作业 web版豆瓣app
- 分区表,管理表
- Java正则表达式
- IIntelliJ IDEA 与 Gradle
- Composer
- mac下配置nginx
- IDEA引入外部jar包的方法
- RIP路由防止环路的方法
- Android实现可拖动的尺子
- Linux之V4L2基础编程