Hive动态分区与建表、插入数据操作

来源：互联网发布：c# 高级编程豆瓣编辑：程序博客网时间：2024/06/11 05:10

1、定义

动态分区指不需要为不同的分区添加不同的插入语句，分区不确定，需要从数据中获取。

set hive.exec.dynamic.partition=true;//使用动态分区

(可通过这个语句查看：set hive.exec.dynamic.partition;)

set hive.exec.dynamic.partition.mode=nonstrict;//无限制模式

如果模式是strict，则必须有一个静态分区，且放在最前面。

SET hive.exec.max.dynamic.partitions.pernode=10000;每个节点生成动态分区最大个数

set hive.exec.max.dynamic.partitions=100000;,生成动态分区最大个数，如果自动分区数大于这个参数，将会报错

set hive.exec.max.created.files＝150000; //一个任务最多可以创建的文件数目

set dfs.datanode.max.xcievers=8192;//限定一次最多打开的文件数set dfs.datanode.max.xcievers=8192;//限定一次最多打开的文件数

2、创建静态分区表与动态分区表在hql语句上没有本质区别，主要区别在于hive.exec.dynamic.partition.mode设置。

例：

CREATE TABLE order (

nameSTRING，

id STRING

)

PARTITIONED BY (monthstring，info string)

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\t';

3、动态分区表插入数据

Strict模式，需要有一个静态分区，且放在最前面

Insert overwritetable order(month=’2016-06’,info)

Select name,id,info(或者 substr(describtion,2,10) as info)from order;

info(或者 substr(describtion,2,10) as info)：

动态分区的字段info可以是表中的Field,也可以是(alias)别名。

实际操作：

创建分区表：

creat tableTest_Info(

name string,

id string

)

Partitioned by(year string,month string,daystring,info string)

向分区表中插入数据：

set mapred.reduce.tasks = 10;

set mapred.job.priority=VERY_HIGH;

insert overwrite table test_infopartition(year="2016",month="06",day="13",info)

select deviceid as name,uuid as id,ip asinfo from ad_install limit 100;

向test_info插入数据时，ad_install的字段不允许错位，否则数据有误，但是名称可以不一致。

插入结果：

在/usr/deployer/warehouse/tmp.db/test_info下会出现year=2016文件夹，其下会有month=06文件夹，再其下会有day=13文件夹下的内容如下图：

清除表的数据：

truncate table test_info;

查看创建表的语句：

show create table test_info;

删除表：

drop table test_info;

创建数据库，指定数据库存储路径，默认为hive的metastore的路径：

create database if not exists lin
location '/tmp/data';

查看数据库：

describe database lin;

hive创建外部表和内表的区别就是表文件夹的位置，外部表可以设置location，并且在删除表的时候内部表能够将文件夹删除，而外部表只能删除数据，无法删除文件夹。

0 0