Hive 基本操作

来源：互联网发布：泉州电视台直播软件编辑：程序博客网时间：2024/06/16 23:26

1.显示地展示当前使用的数据库

2.建表

3.导入本地文件

4.数据库多用户共享表

5.创建数据库

5.1仓库数据库判断数据库是否存在

5.2.显示现在有的数据库

5.3条件查询数据库

5.4创建指定存放文件位置数据库

5.5创建数据库时添加注释信息

5.6创建数据库添加扩展信息

5.7使用数据库

5.8删除数据库

5.9当数据库存在表时，先要删除表再能删除数据库

5.10修改数据库

6.1创建表

6.2创建表-复制表结构

6.3显示某个数据库中的表

6.4显示指定筛选条件表名

6.5显示表扩展信息

6.6指定显示某个字段的信息

7.1External Tables -- 外部表

7.2复制表结构仓库外部表

7.3Partitioned, Managed Tables --分区表

7.3.1建议安全措施

7.3.2查看现有分区

7.3.3查看分区详细分区键

7.3.3通过 DESC 显示分区键

7.3.4从文件读入分区表

7.4.1External Partitioned Tables 外部分区表

7.4.1.1.先创建外部表结构

7.4.2.为外部表增加指定分区

7.4.3.把数据表复制外部表目录结构中

7.4.3显示表分区信息

1.显示地展示当前使用的数据库 :hive> set hive.cli.print.current.db=true;

2.建表：create table t_order(id int,age int) row format delimited fields terminated by ',';

3.导入本地文件：load data local inpath '/home/hadoop/hivedata/data' into table t_order;

4.数据库多用户共享表（解决不同用户不能共享数据库）:mysqld_safe --skip-grant-tables

注意：Hive 不支持行级插入，更新，删除。也不支持事务

5.创建数据库 hive> CREATE DATABASE financials;

5.1仓库数据库判断数据库是否存在 hive> CREATE DATABASE IF NOT EXISTS financials;

5.2.显示现在有的数据库 hive> SHOW DATABASES;

hive> CREATE DATABASE human_resources;

hive> SHOW DATABASES;

5.3条件查询数据库 hive> SHOW DATABASES LIKE 'h.*';

5.4创建指定存放文件位置数据库

hive> CREATE DATABASE financials

> LOCATION '/my/preferred/directory';

5.5创建数据库时添加注释信息

hive> CREATE DATABASE financials

> COMMENT 'Holds all financial tables';

hive> DESCRIBE DATABASE financials;

financials Holds all financial tables

hdfs://master-server/user/hive/warehouse/financials.db

5.6创建数据库添加扩展信息

hive> CREATE DATABASE financials

> WITH DBPROPERTIES ('creator' = 'Mark Moneybags', 'date' = '2012-01-02');

hive> DESCRIBE DATABASE financials;

financials hdfs://master-server/user/hive/warehouse/financials.db

hive> DESCRIBE DATABASE EXTENDED financials;

financials hdfs://master-server/user/hive/warehouse/financials.db

{date=2012-01-02, creator=Mark Moneybags);

5.7使用数据库 hive> USE financials;

5.8删除数据库 hive> DROP DATABASE IF EXISTS financials;

5.9当数据库存在表时，先要删除表再能删除数据库 hive> DROP DATABASE IF EXISTS financials CASCADE;

5.10修改数据库 hive> ALTER DATABASE financials SET DBPROPERTIES ('edited-by' = 'Joe Dba');

There is no way to delete or “unset” a DBPROPERTY 没有方法删除或重置 DBPROPERTY

6.1创建表

CREATE TABLE IF NOT EXISTS mydb.employees (

name STRING COMMENT 'Employee name',

salary FLOAT COMMENT 'Employee salary',

subordinates ARRAY<STRING> COMMENT 'Names of subordinates',

deductions MAP<STRING, FLOAT>

COMMENT 'Keys are deductions names, values are percentages

address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>

COMMENT 'Home address')

COMMENT 'Description of the table'

TBLPROPERTIES ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...)

LOCATION '/user/hive/warehouse/mydb.db/employees';

6.2创建表-复制表结构

CREATE TABLE IF NOT EXISTS mydb.employees2

LIKE mydb.employees;

6.3显示某个数据库中的表

hive> USE mydb;

hive> SHOW TABLES;

6.4显示指定筛选条件表名

hive> USE mydb;

hive> SHOW TABLES 'empl.*';

employees

6.5显示表扩展信息

hive> DESCRIBE EXTENDED mydb.employees;

name string Employee name

salary float Employee salary

subordinates array<string> Names of subordinates

deductions map<string,float> Keys are deductions names, values are percentages

address struct<street:string,city:string,state:string,zip:int> Home address

Detailed Table Information Table(tableName:employees, dbName:mydb, owner:me,

...

location:hdfs://master-server/user/hive/warehouse/mydb.db/employees,

parameters:{creator=me, created_at='2012-01-02 10:00:00',

last_modified_user=me, last_modified_time=1337544510,

comment:Description of the table, ...}, ...)

6.6指定显示某个字段的信息

hive> DESCRIBE mydb.employees.salary;

salary float Employee salary

7.1External Tables -- 外部表

外部表，删除表不删除数据

CREATE EXTERNAL TABLE IF NOT EXISTS stocks (

exchange STRING,

symbol STRING,

ymd STRING,

price_open FLOAT,

price_high FLOAT,

price_low FLOAT,

price_close FLOAT,

volume INT,

price_adj_close FLOAT)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

LOCATION '/data/stocks';

7.2复制表结构仓库外部表

CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3

LIKE mydb.employees

LOCATION '/path/to/data';

7.3Partitioned, Managed Tables --分区表

CREATE TABLE employees (

name STRING,

salary FLOAT,

subordinates ARRAY<STRING>,

deductions MAP<STRING, FLOAT>,

address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>

)

PARTITIONED BY (country STRING, state STRING);

However, Hive will now create subdirectories reflecting the partitioning structure. For

example:

...

.../employees/country=CA/state=AB

.../employees/country=CA/state=BC

...

.../employees/country=US/state=AL

.../employees/country=US/state=AK

...

7.3.1建议安全措施

把HIVE 设置成“严格”模式,禁止分区表的查询没有

一个WHERE子句

hive> set hive.mapred.mode=strict;

hive> SELECT e.name, e.salary FROM employees e LIMIT 100;

FAILED: Error in semantic analysis: No partition predicate found for

Alias "e" Table "employees"

hive> set hive.mapred.mode=nonstrict;

hive> SELECT e.name, e.salary FROM employees e LIMIT 100;

7.3.2查看现有分区

hive> SHOW PARTITIONS employees;

...

Country=CA/state=AB

country=CA/state=BC

...

country=US/state=AL

country=US/state=AK

7.3.3查看分区详细分区键

hive> SHOW PARTITIONS employees PARTITION(country='US');

country=US/state=AL

country=US/state=AK

...

hive> SHOW PARTITIONS employees PARTITION(country='US', state='AK');

country=US/state=AK

7.3.3通过 DESC 显示分区键

hive> DESCRIBE EXTENDED employees;

name string,

salary float,

...

address struct<...>,

country string,

state string

Detailed Table Information...

partitionKeys:[FieldSchema(name:country, type:string, comment:null),

FieldSchema(name:state, type:string, comment:null)],

...

7.3.4从文件读入分区表

LOAD DATA LOCAL INPATH '${env:HOME}/california-employees'

INTO TABLE employees

PARTITION (country = 'US', state = 'CA');

7.4.1External Partitioned Tables 外部分区表

1.先创建外部表结构

CREATE EXTERNAL TABLE IF NOT EXISTS log_messages (

hms INT,

severity STRING,

server STRING,

process_id INT,

message STRING)

PARTITIONED BY (year INT, month INT, day INT)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

7.4.2.为外部表增加指定分区

ALTER TABLE log_messages ADD PARTITION(year = 2012, month = 1, day = 2)

LOCATION 'hdfs://master_server/data/log_messages/2012/01/02';

7.4.3.把数据表复制外部表目录结构中

Copy the data for the partition being moved to S3. For example, you can use the

Hadoop distcp command:

hadoop distcp /data/log_messages/2011/12/02 s3n://ourbucket/logs/2011/12/02

•Alter the table to point the partition to the S3 location:

ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2)

SET LOCATION 's3n://ourbucket/logs/2011/01/02';

•Remove the HDFS copy of the partition using the hadoop fs -rmr command:

hadoop fs -rmr /data/log_messages/2011/01/02

7.4.3显示表分区信息

hive> SHOW PARTITIONS log_messages;

...

year=2011/month=12/day=31

year=2012/month=1/day=1

year=2012/month=1/day=2

hive> DESCRIBE EXTENDED log_messages;

...

message string,

year int,

month int,

day int

Detailed Table Information...

partitionKeys:[FieldSchema(name:year, type:int, comment:null),

FieldSchema(name:month, type:int, comment:null),

FieldSchema(name:day, type:int, comment:null)],

...

hive> DESCRIBE EXTENDED log_messages PARTITION (year=2012, month=1, day=2);

...

location:s3n://ourbucket/logs/2011/01/02,

...

8.1Customizing Table Storage Formats -- 表存储格式(表操作)

CREATE TABLE employees (

name STRING,

salary FLOAT,

subordinates ARRAY<STRING>,

deductions MAP<STRING, FLOAT>,

address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>

)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\001'

COLLECTION ITEMS TERMINATED BY '\002'

MAP KEYS TERMINATED BY '\003'

LINES TERMINATED BY '\n'

STORED AS TEXTFILE;

Dropping Tables -- 删除表

DROP TABLE IF EXISTS employees;

For external tables, the metadata is deleted but the data is not.

Alter Table --修改表结构

ALTER TABLE modifies table metadata only. The data for the table is

untouched. It’s up to you to ensure that any modifications are consistent

with the actual data.

Renaming a Table -- 修改表名

ALTER TABLE log_messages RENAME TO logmsgs;

Adding, Modifying, and Dropping a Table Partition -- 增加，修改，删除表分区

ALTER TABLE log_messages ADD IF NOT EXISTS

PARTITION (year = 2011, month = 1, day = 1) LOCATION '/logs/2011/01/01'

PARTITION (year = 2011, month = 1, day = 2) LOCATION '/logs/2011/01/02'

PARTITION (year = 2011, month = 1, day = 3) LOCATION '/logs/2011/01/03'

ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2)

SET LOCATION 's3n://ourbucket/logs/2011/01/02';

ALTER TABLE log_messages DROP IF EXISTS PARTITION(year = 2011, month = 12, day = 2);

Changing Columns --修改列

ALTER TABLE log_messages

CHANGE COLUMN hms hours_minutes_seconds INT

COMMENT 'The hours, minutes, and seconds part of the timestamp'

AFTER severity;

Adding Columns --增加列

ALTER TABLE log_messages ADD COLUMNS (

app_name STRING COMMENT 'Application name',

session_id LONG COMMENT 'The current session id');

Deleting or Replacing Columns --删除替换列

ALTER TABLE log_messages REPLACE COLUMNS (

hours_mins_secs INT COMMENT 'hour, minute, seconds from timestamp',

severity STRING COMMENT 'The message severity'

message STRING COMMENT 'The rest of the message');

This statement effectively renames the original hms column and removes the server and

process_id columns from the original schema definition. As for all ALTER statements,

only the table metadata is changed.

Alter Table Properties --修改表属性

ALTER TABLE log_messages SET TBLPROPERTIES (

'notes' = 'The process id is no longer captured; this column is always NULL');

Alter Storage Properties --修改存储属性

ALTER TABLE log_messages

PARTITION(year = 2012, month = 1, day = 1)

SET FILEFORMAT SEQUENCEFILE;

You can specify a new SerDe along with SerDe properties or change the properties for

the existing SerDe. The following example specifies that a table will use a Java class

named com.example.JSONSerDe to process a file of JSON-encoded records

ALTER TABLE table_using_JSON_storage

SET SERDE 'com.example.JSONSerDe'

WITH SERDEPROPERTIES (

'prop1' = 'value1',

'prop2' = 'value2');

来源： http://blog.csdn.net/iquicksandi/article/details/8522691

阅读全文

1 0