hive的jdbc

来源：互联网发布：林珊珊的淘宝店编辑：程序博客网时间：2024/04/30 08:25

原文地址： http://blog.csdn.net/nsrainbow/article/details/43002387 最新课程请关注原作者博客，获得更好的显示体验

声明

本文基于Centos 6.x + CDH 5.x

说到Hive就一定要说到写程序的时候怎么调用Hive。以下我通过一个例子说明如果通过java来调用hive查询数据

服务准备

使用Jdbc方式链接hive,首先需要启动hive的Thrift Server，否则会导致错误“Could not establish connection to localhost:10000/default: java.net.ConnectException: Connection refused”

hive --service hiveserver 是两”-“，

数据准备

建立一个文本文件叫 a.txt，内容是

[plain] view plain copy
 
1,terry  
2,alex  
3,jimmy  
4,mike  
5,kate  

并上传到hive服务器的 /data/ 目录下

JDBC调用方法

加载Driver

加载driver (只说hive2的jdbc)

[java] view plain copy
 
Class.forName("org.apache.hive.jdbc.HiveDriver");  

连接数据库

[java] view plain copy
 
Connection con = DriverManager.getConnection("jdbc:hive2://host1:10000/default", "hive", "");  

这里的 host1 是主机名
10000是hive默认的端口名
default是默认的database
hive是默认的用户名，默认密码是空

数据库操作语句

删除表

[java] view plain copy
 
stmt.execute("drop table if exists " + tableName);  

创建表

[java] view plain copy
 
stmt.execute("create table " + tableName + " (key int, value string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'");  

查询数据

[java] view plain copy
 
ResultSet res = stmt.executeQuery("select * from " + tableName);  

导入数据

[java] view plain copy
 
stmt.execute("load data local inpath '" + filepath + "' into table " + tableName);  

例子

建立项目

先打开eclipse建立一个maven项目

pom.xml

[html] view plain copy
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">  
    <modelVersion>4.0.0</modelVersion>  
  
    <groupId>org.crazycake</groupId>  
    <artifactId>play-hive</artifactId>  
    <version>0.0.1-SNAPSHOT</version>  
    <packaging>jar</packaging>  
  
    <name>play-hive</name>  
    <url>http://maven.apache.org</url>  
  
    <properties>  
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>  
    </properties>  
  
    <dependencies>  
        <dependency>  
            <groupId>junit</groupId>  
            <artifactId>junit</artifactId>  
            <version>3.8.1</version>  
            <scope>test</scope>  
        </dependency>  
        <dependency>  
            <groupId>org.apache.hive</groupId>  
            <artifactId>hive-jdbc</artifactId>  
            <version>0.14.0</version>  
        </dependency>  
        <dependency>  
            <groupId>org.apache.hadoop</groupId>  
            <artifactId>hadoop-common</artifactId>  
            <version>2.2.0</version>  
        </dependency>  
    </dependencies>  
  
    <build>  
        <plugins>  
            <plugin>  
                <artifactId>maven-compiler-plugin</artifactId>  
                <version>2.0.2</version>  
                <configuration>  
                    <source>1.6</source>  
                    <target>1.6</target>  
                    <encoding>UTF-8</encoding>  
                    <optimise>true</optimise>  
                    <compilerArgument>-nowarn</compilerArgument>  
                </configuration>  
            </plugin>  
            <plugin>  
                <groupId>org.apache.maven.plugins</groupId>  
                <artifactId>maven-shade-plugin</artifactId>  
                <version>2.3</version>  
                <configuration>  
                    <transformers>  
                        <transformer  
                            implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer">  
                        </transformer>  
                    </transformers>  
                </configuration>  
                <executions>  
                    <execution>  
                        <phase>package</phase>  
                        <goals>  
                            <goal>shade</goal>  
                        </goals>  
                    </execution>  
                </executions>  
            </plugin>  
        </plugins>  
    </build>  
</project>  

其中最重要的就是这两段

[html] view plain copy
 
<dependency>  
    <groupId>org.apache.hive</groupId>  
    <artifactId>hive-jdbc</artifactId>  
    <version>0.14.0</version>  
</dependency>  
<dependency>  
    <groupId>org.apache.hadoop</groupId>  
    <artifactId>hadoop-common</artifactId>  
    <version>2.2.0</version>  
</dependency>  

其他的都无所谓

建表、导入以及查询数据

建立一个类 HiveJdbcClient

[java] view plain copy
 
package org.crazycake.play_hive;  
import java.sql.SQLException;  
import java.sql.Connection;  
import java.sql.ResultSet;  
import java.sql.Statement;  
import java.sql.DriverManager;  
  
/** 
 * 测试hive 的客户端连接 
 * @author alexxiyang (https://github.com/alexxiyang) 
 * 
 */  
public class HiveJdbcClient {  
    
  /** 
   * 注意：hive-server2 引用的driver是  org.apache.hive.* 而 hive-server 是 org.apache.hadoop.hive.* 
   */  
  private static String driverName = "org.apache.hive.jdbc.HiveDriver";  
   
  /** 
   * @param args 
   * @throws SQLException 
   */  
  public static void main(String[] args) throws SQLException {  
      try {  
      Class.forName(driverName);  
    } catch (ClassNotFoundException e) {  
      // TODO Auto-generated catch block  
      e.printStackTrace();  
      System.exit(1);  
    }  
    //hive的默认端口是 10000，如果要修改就修改 hive-site.xml 文件的hive.server2.thrift.port 属性值  
    //默认用户名hive，默认密码为空  
    Connection con = DriverManager.getConnection("jdbc:hive2://host1:10000/default", "hive", "");  
      
    Statement stmt = con.createStatement();  
    //测试的表名 testhivedrivertable  
    String tableName = "testhivedrivertable";  
      
    //如果已经存在就删除  
    stmt.execute("drop table if exists " + tableName);  
      
    //创建这张表  
    stmt.execute("create table " + tableName + " (key int, value string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'");  
    //看下创建是否成功  
    String sql = "show tables '" + tableName + "'";  
    System.out.println("Running: " + sql);  
    ResultSet res = stmt.executeQuery(sql);  
    if (res.next()) {  
      System.out.println(res.getString(1));  
    }  
      
    //看下表结构  
    sql = "describe " + tableName;  
    System.out.println("Running: " + sql);  
    res = stmt.executeQuery(sql);  
    while (res.next()) {  
      System.out.println(res.getString(1) + "\t" + res.getString(2));  
    }  
   
    // 加载数据到表里面  
    // NOTE: filepath 是本地文件所在的位置，注意这个本地不是你的电脑！  
    // 你得先把这个文件上传到服务器，然后这里的路径是服务器上这个文件的路径  
    // NOTE: /data/a.txt  
    String filepath = "/data/a.txt";  
    sql = "load data local inpath '" + filepath + "' into table " + tableName;  
    System.out.println("Running: " + sql);  
    stmt.execute(sql);  
   
    // select * query  
    sql = "select * from " + tableName;  
    System.out.println("Running: " + sql);  
    res = stmt.executeQuery(sql);  
    while (res.next()) {  
      System.out.println(String.valueOf(res.getInt(1)) + "\t" + res.getString(2));  
    }  
  }  
}  

输出是

[plain] view plain copy
 
Running: show tables 'testhivedrivertable'  
testhivedrivertable  
Running: describe testhivedrivertable  
key int  
value   string  
Running: load data local inpath '/data/a.txt' into table testhivedrivertable  
Running: select * from testhivedrivertable  
1   terry  
2   alex  
3   jimmy  
4   mike  
5   kate  
Running: select count(1) from testhivedrivertable  

其实java调用很简单的，就是把你在hive shell 里面执行的语句用jdbc执行一遍而已，所以你传输过去的语句的环境是hive server机器，里面写的路径也是从hive server主机的根目录路径出发去寻找数据，所以我们的 a.txt 得上传到服务器上，这段代码才会运行正常。

0 0