系统学习hive programming，第二章---使用Hive CLI命令

来源：互联网发布：留燕软件编辑：程序博客网时间：2024/05/14 17:19

/*
*    Lee 2013.11.11翻译《programming hive》第二章节 Getting Started   @page表示翻译原文页码
*/

@@page 29

使用 hive --help可以看到hive所有命令行功能

                   代码清单
================================================================
$ bin/hive --help
Usage ./hive <parameters> --service serviceName <service parameters>
Service List: cli help hiveserver hwi jar lineage metastore rcfilecat
Parameters parsed:
--auxpath : Auxiliary jars
--config : Hive configuration directory
--service : Starts specific service/component. cli is default
Parameters used:
HADOOP_HOME : Hadoop install directory
HIVE_OPT : Hive options
For help on a particular service:
./hive --service serviceName --help
Debug help: ./hive --debug --help
==================================================================

@@page 30

主要功能如下
================================================================
选项               描述
cli            定义表，查询，hive的默认选项
---------------------------------------------------------------
hiveserver     Hive Server监听其他进程的Thrift连接,详见本书16章
-----------------------------------------------------------------
hwi            一个简单的web界面用于查询或其他命令，CLI的远程替代
------------------------------------------------------------------
jar            hadoop jar 命令的扩展
-----------------------------------------------------------------
metastore      Hive元数据相关
-------------------------------------------------------------
clients        详见本书28页
----------------------------------------------------------
rcfilecat      浏览 RCFile格式数据的工具

@@page 31

                    #CLI的使用

使用 hive -h 命令列出选项
================================
$ hive --help --service cli
usage: hive
-d,--define <key=value> Variable substitution to apply to hive
commands. e.g. -d A=B or --define A=B
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-H,--help Print help information
-h <hostname> connecting to Hive Server on remote host
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable substitution to apply to hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-p <port> connecting to Hive Server on port number
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode
==============================

                      -d or --define 选项的使用
该选项用于设置键值对，其作用等同于 --hivevar key=value 选项
注意：本功能在hive0.8以上版本才支持。
示例：
   hive -d A=B   用B替代A
当使用该命令时，Hive自动将七纳入hivevar命名空间，以区别其他命名空间的键值对

@@page 32

                   常用的命名空间如下
名称           是否可写         描述
-------------------------------------------------------
hivevar    | Read/Write | 用户自定key-value
hiveconf   |Read/Write |Hive配置参数
system     |ReadWrite   |由JAVA定义的参数
env        |Read only   |环境变量

Hive变量为JAVA String 类型，你可以在Hive查询中直接应用Hive变量，在查询执行
的时候，变量将自动替换为实际的值。

@@page33

使用set命名查看变量值。
例如查看 env命名空间下HOME的值
$ hive
hive> set env:HOME;
env:HOME=/home/thisuser
查看所有变量的值（会列出很多）
hive> set;
... lots of output including these variables:
hive.stats.retries.wait=3000
env:TERM=xterm
system:user.timezone=America/New_York
...
查看和更改自定义变量的值
$ hive --define foo=bar
hive> set foo;
foo=bar;
hive> set hivevar:foo;
hivevar:foo=bar;
hive> set hivevar:foo=bar2;
hive> set foo;
foo=bar2
hive> set hivevar:foo;
hivevar:foo=bar2

在SQL中使用变量
hive> create table toss1(i int, ${hivevar:foo} string);
hive> describe toss1;
i int
bar2 string
hive> create table toss2(i2 int, ${foo} string);
hive> describe toss2;
i2 int
bar2 string
hive> drop table toss1;
hive> drop table toss2;

Hive v0.7.X中 --hiveconf 选项和 -d 选项使用格式对比

$ hive --hiveconf hive.cli.print.current.db=true
hive (default)> set hive.cli.print.current.db;
hive.cli.print.current.db=true
hive (default)> set hiveconf:hive.cli.print.current.db;
hiveconf:hive.cli.print.current.db=true
hive (default)> set hiveconf:hive.cli.print.current.db=false;
hive> set hiveconf:hive.cli.print.current.db=true;
hive (default)> ...
------------------------------------
$ hive --hiveconf y=5
hive> set y;
y=5
hive> CREATE TABLE whatsit(i int);
hive> ... load data into whatsit ...
hive> SELECT * FROM whatsit WHERE i = ${hiveconf:y};
...

可见， -d 和--hiveconf格式不同，功能相同
env中保存shell变量，可以在Hive中调用。这个特性提供shell到hive的通道。
env命名空间作用示例（这个例子我实验时报错的）：
$ YEAR=2012
$ hive -e "SELECT * FROM mytable WHERE year = ${env:YEAR}";

@@ page35

使用 -e 直接运行SQL
示例：
$ hive -e "SELECT * FROM mytable LIMIT 3";
OK
name1 10
name2 20
name3 30
Time taken: 4.955 seconds
$
加上-S k 可以去掉 OK 和 Time taken两行，这样通过重定向技术，保存搜索结果到
文件。例如：
$ hive -S -e "select * FROM mytable LIMIT 3" > /tmp/myquery
$ cat /tmp/myquery
name1 10
name2 20
name3 30

注意：以上代码重定向输出文件/tmp/myquery　为本地文件，非ＨＤＦＳ文件
另一个重定向示例，利用grep命令　检索ｈｉｖｅ中的warehouse：
$ hive -S -e "set" | grep warehouse
hive.metastore.warehouse.dir=/user/hive/warehouse
hive.warehouse.subdir.inherit.perms=false
（译注：用ｗｃ统计行，　或者自行开发ｌｉｎｕｘ　Ｃ程序，通过shell粘合可以更灵活）

#hive使用外部sql
Hive通过 -f 命令使用外部sql查询文件，文件应该以.q或者.hql结尾。使用示例：
$ hive -f /path/to/file/withqueries.hql
如果已经进入 hive 命令行，使用 SOURCE 命名加载sql代码
$ cat /path/to/file/withqueries.hql
SELECT x.* FROM src x;
$ hive
hive> source /path/to/file/withqueries.hql;
...

@@page 37

在hive命令行中使用linux命令
在 hive> 提示符下，使用！即可使用linux命令而无需退出hive，例如：
hive> ! /bin/echo "what up dog";
"what up dog"
hive> ! pwd;
/home/me/hiveplay

在hive> 提示符下，使用dfs 选项直接使用HDFS命令
例如：
hive> dfs -ls / ;
Found 3 items
drwxr-xr-x - root supergroup 0 2011-08-17 16:27 /etl
drwxr-xr-x - edward supergroup 0 2012-01-18 15:51 /flag
drwxrwxr-x - hadoop supergroup 0 2010-02-03 17:50 /users

使用 hive> dfs -help 列出可以用的hdfs命令列表

@@page 37

在Hive 中使用 -- 作为注释，其用法，等同与Oracle

hive select语句默认不输出表头。如需要输出表头，修改hiveconf 参数 hive.cli.print.header
示例：
hive> set hive.cli.print.header=true;
hive> SELECT * FROM system_logs LIMIT 3;
tstamp severity server message
1335667117.337715 ERROR server1 Hard drive hd1 is 90% full!
1335667117.338012 WARN server1 Slow response from server2.
1335667117.339234 WARN server2 Uh, Dude, I'm kinda busy right now...

hiverc文件用于记录启动hive时自动执行一些语句，常用做设置参数。
（我没有找到该文件，尝试建立一个也是失效的）
例如上例中设置表头，
可以在$HOME/.hiverc 文件中加入：
set hive.cli.print.header=true;