hive 介绍

来源：互联网发布：nvidia控制面板优化编辑：程序博客网时间：2024/04/28 08:34

一：简介

1.hive是基于hadoop文件系统之上的数据仓库架构；能更好的处理不变的大规模的数据集（如网络日志）上的批量用户；本身没有专门的数据存储格式。

2.hive中的四类数据模型：表（table），外部表（external table），分区（Partition）和桶（Bucket）.

3.创建表包括两步：创建表过程，数据加载；外部表创建只有一步：加载数据和创建表同时完成。

4.hive默认情况下配置好了Derby数据库的链接库的链接参数。

二：创建表

1 创建普通的表

create table user(userid int,name string,pawss string)

comment 'this is the user view table'

2 创建一个分区表，并用制表符来区分同一行的不同字段

create table user(userid int,name string,age int,pawss string)

comment 'this is the user view table'

partitioned by(age int)---------根据年龄分区

row format delimited

fields terminated by '\001'----用制表符分开

stored as sequencefile;--------数据需要压缩

3 添加聚类存储

将列按照userid进行分区并划分到不同的桶中，按照age值进行大小排序进行存储。这样存储允许用户通过useid属性高效的对集群列进行采样。

create table user(userid int,name string,age int,pawss string)

comment 'this is the user view table'

partitioned by(age int)

clustered by(userid) sorted by(age) into 32 buckets---按照userid进行分区并划分到不同的桶中，按照age值进行大小排序进行存储

row format delimited

fields terminated by '\t'

stored as sequencefile;

4 指定存储路径

通过Location为表指定新的存储位置

create table user(userid int,name string,age int,pawss string)

comment 'this is the user view table'

partitioned by(age int)

row format delimited

fields terminated by '\t'

stored as textfile---将数据存储为纯文本文件。

location '本地路径'

续........

0 0