impala的操作

来源：互联网发布：海南航空 it 待遇编辑：程序博客网时间：2024/06/05 04:47

1、-h 外能帮助

格式：

[root@hadoop-worer1-xiaoyacrm ~]# impala-shell -h

Usage: impala_shell.py [options]

Options:

-h, --help show this help message and exit

-i IMPALAD, --impalad=IMPALAD

<host:port> of impalad to connect to

[default: hadoop-worer1-xiaoyacrm:21000]

-q QUERY, --query=QUERY

Execute a query without the shell [default: none]

-f QUERY_FILE, --query_file=QUERY_FILE

Execute the queries in the query file, delimited by ;

[default: none]

-k, --kerberos Connect to a kerberized impalad [default: False]

2、-r 刷新整个元数据*

（Refresh Impala catalog after connecting，默认为false）

2.1 在hive创建表t1

hive> create table t1(id int ,name string);

Time taken: 0.423 seconds

2.2 通过impala-shell 查看对应的表，发现不存在，原因是需要通过手动涮新hive metadata

show tables;

$ impala-shell -r

执行后，在通过show tables 可以查看到刚才的表

3、-B 去格式化，查询大数据量时可以提高性能*

3.1 在impala shell中初始化数据

insert into table t1(id,name) values(100,'sfl');
insert into table t1(id,name) values(101,'zs');
insert into table t1(id,name) values(102,'ls');

3.2 在impala 查看数据和hive中查看数据

select * from t1;

发现结果完成一致，原因就是impala和hive中存储的数据都存在同一个元数据中

3.3 通过-B 演示

$ impala-shell -B -q 'select * from shenfuli.t1;' -o a.txt

$ more a.txt

102 ls

100 sfl

101 zs

通过-B发现，输出格式通过Hive的输出内容一致，由于-r是对整个元数据库进行刷新，实际生产环境中不建议这么用。

3.4 通过-B --print_header 可以显示列的名称

$ impala-shell -B --print_header -q 'select * from shenfuli.t1;' -o c.txt

$ more c.txt

id name

100 sfl

102 ls

101 zs

4、 -v 查看对应版本

$ impala-shell -v

Impala Shell v2.2.0-cdh5.4.4 (a13d3c6) built on Mon Jul 6 16:57:34 PDT 2015

$ impala-shell

Starting Impala Shell without Kerberos authentication

Connected to crxy168:21000

Server version: impalad version 2.2.0-cdh5.4.4 RELEASE (build a13d3c6b203e79a284b509df821bffbe229e6dc3)

Welcome to the Impala shell. Press TAB twice to see a list of available commands.

(Shell build version: Impala Shell v2.2.0-cdh5.4.4 (a13d3c6) built on Mon Jul 6 16:57:34 PDT 2015)

注: 一般情况下升级Impala后，需要检查Impala version和Impala shell version，两个版本必须一致，否则可能会出现查询异常的情况。

5、 -f 执行查询文件*

--query_file 指定查询文件

$ cat impala-sql

select * from shenfuli.t1;

$ impala-shell -f impala-sql ;

$ impala-shell -B -f impala-sql -o d.txt;

$ more d.txt

102 ls

100 sfl

101 zs

说明：实际工作中的SQL语句都是通过写到一个文件中，然后通过-f命令调用。

6、 -o 保存执行结果到文件*

--output_file 指定输出文件名

7、 -q 不进入impala-shell执行查询

$ impala-shell -q 'select * from shenfuli.t1' --output_file=b.txt

$ more b.txt

+-----+------+

| id | name |

+-----+------+

| 102 | ls |

| 100 | sfl |

| 101 | zs |

+-----+------+

8、 -p 显示执行计划
--quiet 不显示多余信息

$ impala-shell -q 'select * from shenfuli.t1;' -p >1.txt

说明：文件1.txt 含有详细的执行计划，通过该文件可以分析SQL，优化SQL语句。

9、刷新某个表元数据

refresh <tablename> 属于增量刷新

说明：相比-r，通过refresh 一个表更加使用，并且属于增量刷新。

10、显示一个查询的执行计划及各步骤信息

explain <sql> 可以设置set explain_level，总共分成4个级别，分别0-3。数字越大，输出信息越详细

阅读全文

0 0