hive常用命令总结

来源：互联网发布：js获取子元素属性编辑：程序博客网时间：2024/06/06 08:24

hive简介

hive 是一个大数据仓库分析工具，它可以使用类似sql语句的方式操作集群上的数据文件。

转述一段官网的描述：
The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax.

Built on top of Apache Hadoop™, Hive provides the following features:

Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.
A mechanism to impose structure on a variety of data formats
Access to files stored either directly in Apache HDFS™ or in other data storage systems such as Apache HBase™
Query execution via Apache Tez™, Apache Spark™, or MapReduce
Procedural language with HPL-SQL
Sub-second query retrieval via Hive LLAP, Apache YARN and Apache Slider

hive官网wiki
https://cwiki.apache.org/confluence/display/Hive/Home

hive架构、工作原理
http://blog.csdn.net/u010330043/article/details/51225021

hive sql语句的解析执行过程
http://blog.csdn.net/jojo52013145/article/details/19206559

hive sql常见语句

1、创建表

内表

create table aa(col1 string,col2 int) partitioned by(statdate int) ROW FORMAT DELIMITED   FIELDS TERMINATED BY '\t'

外表

create external table bb(col1 string, col2 int)  partitioned by(statdate int) ROW FORMAT DELIMITED   FIELDS TERMINATED BY '\t'  location '/user/gaofei.lu/';

2、查看表

show create table aa;

3、导入表数据

本地数据：load data local inpath ' /home/gaofei.lu/aa.txt' into table aa partition(statdate=20170403)
hdfs上数据：load data inpath '/user/gaofei.lu/aa.txt' into table bb partition(statdate=20170403)

4、修改表属性

alter table aa set tblproperties ('EXTERNAL'='TRUE')
alter table bb set tblproperties ('EXTERNAL'='FALSE')

5、修改列

修改列名和列数据类型：alter table aa change col2 name string ;
修改位置放置第一位：alter table aa change col2 name string first;
修改位置指定某一列后面：alter table aa change col1 dept string after name;

6、添加列(慎用)

alter table aa add columns(col3 string);

7、表的重命名

alter table aa rename to aa_test;

8、添加分区

alter table aa add partition(statdate=20170404);
alter table bb add partition(statdate=20170404) location '/user/gaofei.lu/20170404.txt'

9、显示分区

show partitions aa;

10、修改分区

alter table aa partition(statdate=20170404) rename to partition(statdate=20170405);
alter table bb partition(statdate=20170404) set location '/user/gaofei.lu/aa.txt';

11、删除分区

alter table aa drop if exists partition(statdate=20170404);

12、beeline连接

beeline
!connect jdbc:hive2://192.168.1.17:10000

13、设置hive on spark

set hive.execution.engine=spark

14、终止任务

yarn application -kill job_id

15、指定分隔符导出文件

insert overwrite local directory '/home/hadoop/gaofeilu/test_delimited.txt'
row format delimited
fields terminated by '\t'
select * from test;

分区与表的常见操作
https://www.iteblog.com/archives/1537.html

hive常见函数
http://www.mamicode.com/info-detail-933740.html

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

python连接hiveserver2

import time
import sys 
import os
import pyhs2
from pyhs2.error import Pyhs2Exception
try_limit=2
class lib_hive2_query(object):
    def __init__(self):    
        global conn
        global cur       
        conn = pyhs2.connect(
                               host='192.168.1.17',
                               port=10000,
                               authMechanism="PLAIN",
                               user='hadoop',
                               password='',
                               )    
        print "init ok" 
        cur = conn.cursor()  
    def hiveExe(self,sql):         
        try_time = 1 
        succ = False
        while try_time <= try_limit and succ == False:
            try:
                try_time = try_time + 1 
                cur.execute(sql)
                for i in cur.fetch():
                    print i
                succ = True
                return 0   
            except Pyhs2Exception, ex:          
                print str(ex)
                if try_time > try_limit:
                    return 1
                time.sleep(10)
            finally:
                print "end query" 
    def __del__(self):
        if conn:
            conn.close()
if __name__ == '__main__':
        hiveClient=lib_hive2_query()
        hive_sql="select * from db_model.s_request_d limit 10" 
        hiveClient.hiveExe(hive_sql)

阅读全文

0 0