HiveQL：数据定义

来源：互联网发布：mpv播放器 mac 编辑：程序博客网时间：2024/05/19 13:27

Hive中的数据库

创建数据库

正常普通的创建：

hive> create database financials;

加判断的创建（如果数据库不存在才创建，防止抛出错误信息）：

hive> create database if not exists financials;

规定数据库存放位置的创建：

hive> create database financials location '/my/database';

增加描述信息的创建：

hive> create database financials comment 'holds all financial tables';hive> describe database financials;financials      holds all financial tables      file:/home/me/hive/warehouse/financials.db

增加一些键值对属性信息的创建：

hive> create database financials with dbproperties('creator'='Leon','date'='2017-02-09');hive> describe database extended financials;financials              file:/home/me/hive/warehouse/financials.db      {date=2017-02-09, creator=Leon}

设置当前工作数据库

和mysql一样，用use：

hive> use financials;

通过设置打印当前数据库的属性，可以知道当前所在的数据库：

hive> set hive.cli.print.current.db=true;hive (default)>

查找数据库

正常的打印所有数据库（命令和mysql的一毛一样）：

hive> show databases;

拿正则表达式匹配来刷选的数据库（下面的例子是刷选出以f开头，其他字符结尾的数据库名）:

hive> show database like 'f.*';

修改数据库

可以用alter database来为数据库的dbproperties设置键值对属性，但是数据库名和所在位置不能修改：

hive> alter database financials set dbproperties('user'="Leon");

删除数据库

还是用drop：

hive> drop database financials;

Hive中的表

创建表

基本语法与mysql相同，关键字（comment，tblproperties）与上面的创建数据库时的对应关键字的效果相同：

hive> create table if not exists mydb.employees(name                 string comment 'Employee name',salary               float comment 'Employee salary',subordinates         array<string> comment 'Name of subordinates',deductions           map<string, float>                     comment 'keys are deductions name,values are percentages',address              struct<street:string,city:string,state:string,zip:int>                     comment 'Home address')comment 'description of the table'tblproperties ('creator'='Leon','create_at'='2017-02-09 12:00');

通过拷贝一张已经存在表的表模式来创建表：

hive> create table if not exists mydb.employees2 like mydb.employees;

查找表

很普通的查当前数据库下的表：

hive> use mydb;hive> show tables;

直接查找某个数据库下的表：

hive> show tables in mydb;

和数据库的查找一样，同样可以用正则表达式来刷选：

hive> show tables 'empl.*';

查看表的描述（tableType:MANAGED_TABLE表示该表为管理表，如果值为EXTERNAL_TABLE则表示该表为外部表，后面会讲到外部表的创建）：

hive> describe extended employees;

创建一张外部表

因为是外部表，Hive会认为并非完全拥有这份数据，因此，删除该表并不会删除该份数据，不过描述表的元数据信息会被删除掉（以下命令将创建一个外部表，其可以读取所有位于/data/stocks 目录下的以逗号分隔的数据）：

hive> create external table if not exists stocks (exchange         string,symbol           string,ymd              string,price_open       float,price_high       float,price_low        float,price_close      float,volume           int,price_adj_close  float)row format delimited fields terminated by ','location '/data/stocks';

分区表

假设现在雇员过多，遍布世界各地，然后我们要按照address中的country和state来分区，那么命令就是：

hive> create table employees1(name               string,salary             float,subordinates       array<string>,deductions         map<string,float>,address            struct<street:string,city:string,state:string,zip:int>)partitioned by(country string,state string);

可以用 show partitions 来查看表中存在的所有分区：

hive>  show partitions employees1;

查询时，如果只想查询某个特定分区键的分区的话，可以使用partition子句：

hive> show partitions employees1 partition(country='CN');

外部分区表

创建一张外部分区表：

hive> create external table if not exists log_message(hms         int,severity    string,server      string,process_id  int,message     string)partitioned by (year int,month int,day int)row format delimited fields terminated by '\t';

自定义表的存储格式

Hive的默认存储方式是文本文件格式，这个可以通过可选的子句stored as textfile显式指定，同时可以在建表时指定各种各样的分隔符，比如：

hive> create table employees3(name           string,salary         float,subordinates   array<string>,deductions     map<string,float>,address        struct<street:string,city:string,state:string,zip:int>)row format delimitedfields terminated by '\001'collection items terminated by '\002'map keys terminated by '\003'lines terminated by '\n'stored as textfile;

删除表

和删除数据库是很类似的：

hive> drop table if exists employees;

修改表

给表重命名：

hive> alter table log_message rename to logmsgs;

增加表分区：

hive> alter table logmsgs add if not existspartition(year=2017,month=1,day=1) location 'logs/2017/01/01'partition(year=2017,month=1,day=2) location 'logs/2017/01/02'...

修改表分区：

hive> alter table logmsgs partition(year=2017,month=02,day=09)set location '/usr/log/2017/02/09'

删除表分区：

hive> alter table logmsgs drop if exists partition(year=2017,month=01,day=01);

修改列信息：

hive> alter table logmsgschange column hms hours_minutes_seconds intcomment 'The hours,minutes and seconds part of the timestamp'after severity;

增加列：

hive> alter table logmsgs add columns(app_name string comment 'Application name',session_id long comment 'The current session id');

删除或者替换列：

hive> alter table logmsgs replace columns(hours_minutes_second int comment 'The hours,minutes and seconds part of the timestamp',severity         string comment 'The message severity'message          string comment 'The rest of the message');

修改表属性（可以增加或修改属性，但是无法删除）：

hive> alter table logmsgs set tblproperties('notes'='The process id is no longer captured;this column is always NULL')

修改存储属性：

hive> alter table logmsgs partition(year=2017,month=01,day=01)set fileformat sequencefile;

给分区加NO_DROP属性防止分区被删除：

hive> alter table logmsgspartition(year=2017,month=02,day=01) enable NO_DROP;

给分区加OFFLINE属性防止分区被查询：

hive> alter table logmsgspartition(year=2017,month=02,day=01) enable OFFLINE;

附我在开源中国那里的原文：
https://my.oschina.net/lonelycode/blog/834894

0 0