Hive通过JavaAPI操作

来源：互联网发布：如何查看淘宝等级编辑：程序博客网时间：2024/04/20 19:22

Java 想要访问Hive，需要通过beeline的方式连接Hive，hiveserver2提供了一个新的命令行工具beeline，hiveserver2 对之前的hive做了升级，功能更加强大，它增加了权限控制，要使用beeline需要先启动hiverserver2，再使用beeline连接

基于hadoop的Hive数据仓库JavaAPI简单调用的实例，关于Hive的简介在此不赘述。hive提供了三种用户接口：CLI，JDBC/ODBC和 WebUI

CLI，即Shell命令行
JDBC/ODBC 是 Hive 的Java，与使用传统数据库JDBC的方式类似
WebGUI是通过浏览器访问 Hive
本文主要介绍的就是第二种用户接口及Beeline的简单使用。

1.Beeline的使用

1.启动hiveserver2

$ hiveserver2

2.启动beeline连接hive

$ beeline -u jdbc:hive2://192.168.150.1:10000 -n hadoop -p

这里写图片描述

参数的解释:
-u：连接url，可以使用IP，也可以使用主机名，端口默认为10000
-n：连接的用户名（注：不是登录hive的用户名，是hive所在服务器登录用户名）
-p：密码，可以不用输入

可以使用如下命令来修改端口号(默认端口号是:10000):

hiveserver2 --hiveconf hive.server2.thrift.port=10001

如果不知道beeline怎么使用，可以使用如下命令来查看beeline的使用帮助

$ beeline --help

具体如下:

[hadoop@hadoop ~]$ beeline --helpUsage: java org.apache.hive.cli.beeline.BeeLine    -u <database url>               the JDBC URL to connect to   -r                              reconnect to last saved connect url (in conjunction with !save)   -n <username>                   the username to connect as   -p <password>                   the password to connect as   -d <driver class>               the driver class to use   -i <init file>                  script file for initialization   -e <query>                      query that should be executed   -f <exec file>                  script file that should be executed   -w (or) --password-file <password file>  the password file to read password from   --hiveconf property=value       Use value for given property   --hivevar name=value            hive variable name and value                                   This is Hive specific settings in which variables                                   can be set at session level and referenced in Hive                                   commands or queries.   --property-file=<property-file> the file to read connection properties (url, driver, user, password) from   --color=[true/false]            control whether color is used for display   --showHeader=[true/false]       show column names in query results   --headerInterval=ROWS;          the interval between which heades are displayed   --fastConnect=[true/false]      skip building table/column list for tab-completion   --autoCommit=[true/false]       enable/disable automatic transaction commit   --verbose=[true/false]          show verbose error messages and debug info   --showWarnings=[true/false]     display connection warnings   --showDbInPrompt=[true/false]   display the current database name in the prompt   --showNestedErrs=[true/false]   display nested errors   --numberFormat=[pattern]        format numbers using DecimalFormat pattern   --force=[true/false]            continue running script even after errors   --maxWidth=MAXWIDTH             the maximum width of the terminal   --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns   --silent=[true/false]           be more silent   --autosave=[true/false]         automatically save preferences   --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv]  format mode for result display                                   Note that csv, and tsv are deprecated - use csv2, tsv2 instead   --incremental=[true/false]      Defaults to false. When set to false, the entire result set                                   is fetched and buffered before being displayed, yielding optimal                                   display column sizing. When set to true, result rows are displayed                                   immediately as they are fetched, yielding lower latency and                                   memory usage at the price of extra display column padding.                                   Setting --incremental=true is recommended if you encounter an OutOfMemory                                   on the client side (due to the fetched result set size being large).                                   Only applicable if --outputformat=table.   --incrementalBufferRows=NUMROWS the number of rows to buffer when printing rows on stdout,                                   defaults to 1000; only applicable if --incremental=true                                   and --outputformat=table   --truncateTable=[true/false]    truncate table column when it exceeds length   --delimiterForDSV=DELIMITER     specify the delimiter for delimiter-separated values output format (default: |)   --isolation=LEVEL               set the transaction isolation level   --nullemptystring=[true/false]  set to true to get historic behavior of printing null as empty string   --maxHistoryRows=MAXHISTORYROWS The maximum number of rows to store beeline history.   --help                          display this message   Example:    1. Connect using simple authentication to HiveServer2 on localhost:10000    $ beeline -u jdbc:hive2://localhost:10000 username password    2. Connect using simple authentication to HiveServer2 on hs.local:10000 using -n for username and -p for password    $ beeline -n username -p password -u jdbc:hive2://hs2.local:10012    3. Connect using Kerberos authentication with hive/localhost@mydomain.com as HiveServer2 principal    $ beeline -u "jdbc:hive2://hs2.local:10013/default;principal=hive/localhost@mydomain.com"    4. Connect using SSL connection to HiveServer2 on localhost at 10000    $ beeline "jdbc:hive2://localhost:10000/default;ssl=true;sslTrustStore=/usr/local/truststore;trustStorePassword=mytruststorepassword"    5. Connect using LDAP authentication    $ beeline -u jdbc:hive2://hs2.local:10013/default <ldap-username> <ldap-password>

如果使用beeline连接时报了如下错:

hadoop is not allowed to impersonate hadoop (state=08S01,code=0)

原因：hiveserver2增加了权限控制，需要在hadoop的配置文件中配置
解决方法：在hadoop的core-site.xml中添加如下内容，然后重启hadoop，再使用beeline连接即可

参考官网： https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html

<property>    <name>hadoop.proxyuser.hadoop.hosts</name>    <value>*</value></property><property>    <name>hadoop.proxyuser.hadoop.groups</name>    <value>*</value></property>

连接成功后，和执行hive后相同执行shell命令即可，如果想要退出连接使用 !q 或 !quit 命令

2.Java API 操作 Hive 的例子

用idea工具创建一个maven项目，pom.xml文件配置如下:

<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0"         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">    <modelVersion>4.0.0</modelVersion>    <groupId>com.bigdata.hadoop</groupId>    <artifactId>hive</artifactId>    <version>1.0-SNAPSHOT</version>    <properties>        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>    </properties>    <dependencies>        <dependency>            <groupId>org.apache.hive</groupId>            <artifactId>hive-jdbc</artifactId>            <version>2.3.0</version>        </dependency>        <dependency>            <groupId>junit</groupId>            <artifactId>junit</artifactId>            <version>4.9</version>        </dependency>    </dependencies>    <build>        <plugins>            <plugin>                <groupId>org.apache.maven.plugins</groupId>                <artifactId>maven-compiler-plugin</artifactId>                <version>3.5.1</version>                <configuration>                    <source>1.8</source>                    <target>1.8</target>                </configuration>            </plugin>        </plugins>    </build></project>

创建测试类HiveJDBC，代码如下
官网参考：https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

package com.bigdata.hadoop.hive;import org.junit.After;import org.junit.Before;import org.junit.Test;import java.sql.*;/** * JDBC 操作 Hive（注：JDBC 访问 Hive 前需要先启动HiveServer2） */public class HiveJDBC {    private static String driverName = "org.apache.hive.jdbc.HiveDriver";    private static String url = "jdbc:hive2://hdpcomprs:10000/db_comprs";    private static String user = "hadoop";    private static String password = "";    private static Connection conn = null;    private static Statement stmt = null;    private static ResultSet rs = null;    // 加载驱动、创建连接    @Before    public void init() throws Exception {        Class.forName(driverName);        conn = DriverManager.getConnection(url,user,password);        stmt = conn.createStatement();    }    // 创建数据库    @Test    public void createDatabase() throws Exception {        String sql = "create database hive_jdbc_test";        System.out.println("Running: " + sql);        stmt.execute(sql);    }    // 查询所有数据库    @Test    public void showDatabases() throws Exception {        String sql = "show databases";        System.out.println("Running: " + sql);        rs = stmt.executeQuery(sql);        while (rs.next()) {            System.out.println(rs.getString(1));        }    }    // 创建表    @Test    public void createTable() throws Exception {        String sql = "create table emp(\n" +                        "empno int,\n" +                        "ename string,\n" +                        "job string,\n" +                        "mgr int,\n" +                        "hiredate string,\n" +                        "sal double,\n" +                        "comm double,\n" +                        "deptno int\n" +                        ")\n" +                     "row format delimited fields terminated by '\\t'";        System.out.println("Running: " + sql);        stmt.execute(sql);    }    // 查询所有表    @Test    public void showTables() throws Exception {        String sql = "show tables";        System.out.println("Running: " + sql);        rs = stmt.executeQuery(sql);        while (rs.next()) {            System.out.println(rs.getString(1));        }    }    // 查看表结构    @Test    public void descTable() throws Exception {        String sql = "desc emp";        System.out.println("Running: " + sql);        rs = stmt.executeQuery(sql);        while (rs.next()) {            System.out.println(rs.getString(1) + "\t" + rs.getString(2));        }    }    // 加载数据    @Test    public void loadData() throws Exception {        String filePath = "/home/hadoop/data/emp.txt";        String sql = "load data local inpath '" + filePath + "' overwrite into table emp";        System.out.println("Running: " + sql);        stmt.execute(sql);    }    // 查询数据    @Test    public void selectData() throws Exception {        String sql = "select * from emp";        System.out.println("Running: " + sql);        rs = stmt.executeQuery(sql);        System.out.println("员工编号" + "\t" + "员工姓名" + "\t" + "工作岗位");        while (rs.next()) {            System.out.println(rs.getString("empno") + "\t\t" + rs.getString("ename") + "\t\t" + rs.getString("job"));        }    }    // 统计查询（会运行mapreduce作业）    @Test    public void countData() throws Exception {        String sql = "select count(1) from emp";        System.out.println("Running: " + sql);        rs = stmt.executeQuery(sql);        while (rs.next()) {            System.out.println(rs.getInt(1) );        }    }    // 删除数据库    @Test    public void dropDatabase() throws Exception {        String sql = "drop database if exists hive_jdbc_test";        System.out.println("Running: " + sql);        stmt.execute(sql);    }    // 删除数据库表    @Test    public void deopTable() throws Exception {        String sql = "drop table if exists emp";        System.out.println("Running: " + sql);        stmt.execute(sql);    }    // 释放资源    @After    public void destory() throws Exception {        if ( rs != null) {            rs.close();        }        if (stmt != null) {            stmt.close();        }        if (conn != null) {            conn.close();        }    }}

推荐博客:
[Hive]HiveServer2配置详解:
http://m.blog.csdn.net/SunnyYoona/article/details/75322224

阅读全文

0 0