hive常用命令总结

来源:互联网 发布:js获取子元素属性 编辑:程序博客网 时间:2024/06/06 08:24

hive简介

hive 是一个大数据仓库分析工具,它可以使用类似sql语句的方式操作集群上的数据文件。 

转述一段官网的描述: 
The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax. 
 
 
Built on top of Apache Hadoop™, Hive provides the following features: 
 
Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis. 
A mechanism to impose structure on a variety of data formats 
Access to files stored either directly in Apache HDFS™ or in other data storage systems such as Apache HBase™ 
Query execution via Apache Tez™, Apache Spark™, or MapReduce 
Procedural language with HPL-SQL 
Sub-second query retrieval via Hive LLAP, Apache YARN and Apache Slider 
 
hive官网wiki 
https://cwiki.apache.org/confluence/display/Hive/Home 
 
hive架构、工作原理 
http://blog.csdn.net/u010330043/article/details/51225021 
 
hive sql语句的解析执行过程 
http://blog.csdn.net/jojo52013145/article/details/19206559


hive sql常见语句

1、创建表

内表

  1. create table aa(col1 string,col2 int) partitioned by(statdate int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'


外表

  1. create external table bb(col1 string, col2 int) partitioned by(statdate int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' location '/user/gaofei.lu/';
2、查看表
  1. show create table aa;
3、导入表数据
  1. 本地数据:load data local inpath ' /home/gaofei.lu/aa.txt' into table aa partition(statdate=20170403)
  2. hdfs上数据:load data inpath '/user/gaofei.lu/aa.txt' into table bb partition(statdate=20170403)
4、修改表属性
  1. alter table aa set tblproperties ('EXTERNAL'='TRUE')
  2. alter table bb set tblproperties ('EXTERNAL'='FALSE')
5、修改列
  1. 修改列名和列数据类型:alter table aa change col2 name string ;
  2. 修改位置放置第一位:alter table aa change col2 name string first;
  3. 修改位置指定某一列后面:alter table aa change col1 dept string after name;
6、添加列(慎用)
  1. alter table aa add columns(col3 string);
7、表的重命名
  1. alter table aa rename to aa_test;
8、添加分区
  1. alter table aa add partition(statdate=20170404);
  2. alter table bb add partition(statdate=20170404) location '/user/gaofei.lu/20170404.txt'
9、显示分区
  1. show partitions aa;
10、修改分区
  1. alter table aa partition(statdate=20170404) rename to partition(statdate=20170405);
  2. alter table bb partition(statdate=20170404) set location '/user/gaofei.lu/aa.txt';
11、删除分区
  1. alter table aa drop if exists partition(statdate=20170404);

12、beeline连接

  1. beeline
  2. !connect jdbc:hive2://192.168.1.17:10000

13、设置hive on spark

  1. set hive.execution.engine=spark

14、终止任务

  1. yarn application -kill job_id

15、指定分隔符导出文件

  1. insert overwrite local directory '/home/hadoop/gaofeilu/test_delimited.txt'
  2. row format delimited
  3. fields terminated by '\t'
  4. select * from test;

 
分区与表的常见操作 
https://www.iteblog.com/archives/1537.html 
 
hive常见函数 
http://www.mamicode.com/info-detail-933740.html 
 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

 
 
python连接hiveserver2

  1. import time
  2. import sys
  3. import os
  4. import pyhs2
  5. from pyhs2.error import Pyhs2Exception
  6. try_limit=2
  7. class lib_hive2_query(object):
  8. def __init__(self):
  9. global conn
  10. global cur
  11. conn = pyhs2.connect(
  12. host='192.168.1.17',
  13. port=10000,
  14. authMechanism="PLAIN",
  15. user='hadoop',
  16. password='',
  17. )
  18. print "init ok"
  19. cur = conn.cursor()
  20. def hiveExe(self,sql):
  21. try_time = 1
  22. succ = False
  23. while try_time <= try_limit and succ == False:
  24. try:
  25. try_time = try_time + 1
  26. cur.execute(sql)
  27. for i in cur.fetch():
  28. print i
  29. succ = True
  30. return 0
  31. except Pyhs2Exception, ex:
  32. print str(ex)
  33. if try_time > try_limit:
  34. return 1
  35. time.sleep(10)
  36. finally:
  37. print "end query"
  38. def __del__(self):
  39. if conn:
  40. conn.close()
  41. if __name__ == '__main__':
  42. hiveClient=lib_hive2_query()
  43. hive_sql="select * from db_model.s_request_d limit 10"
  44. hiveClient.hiveExe(hive_sql)
原创粉丝点击