Java操作Hbase

来源：互联网发布：网红淘宝店前十名男装编辑：程序博客网时间：2024/05/17 08:12

注整理自：http://booby325.iteye.com/blog/1316965 和 http://javacrazyer.iteye.com/blog/1186881

一、Hbase Java API简介

HBase提供了java api来对HBase进行一系列的管理涉及到对表的管理、数据的操作等。常用的API操作有：

1. 对表的创建、删除、显示以及修改等，可以用HBaseAdmin，一旦创建了表，那么可以通过HTable的实例来访问表，每次可以往表里增加数据。

2. 插入数据

创建一个Put对象，在这个Put对象里可以指定要给哪个列增加数据，以及当前的时间戳等值，然后通过调用HTable.put(Put)来提交操作，子猴在这里提请注意的是：在创建Put对象的时候，你必须指定一个行(Row)值，在构造Put对象的时候作为参数传入。

3. 获取数据

要获取数据，使用Get对象，Get对象同Put对象一样有好几个构造函数，通常在构造的时候传入行值，表示取第几行的数据，通过HTable.get(Get)来调用。

4. 浏览每一行

通过Scan可以对表中的行进行浏览，得到每一行的信息，比如列名，时间戳等，Scan相当于一个游标，通过next()来浏览下一个，通过调用HTable.getScanner(Scan)来返回一个ResultScanner对象。HTable.get(Get)和HTable.getScanner(Scan)都是返回一个Result。Result是一个

KeyValue的链表。

5. 删除

使用Delete来删除记录，通过调用HTable.delete(Delete)来执行删除操作。（注：删除这里有些特别，也就是删除并不是马上将数据从表中删除。）

6. 锁

新增、获取、删除在操作过程中会对所操作的行加一个锁，而浏览却不会。

7. 簇的访问

客户端代码通过ZooKeeper来访问找到簇，也就是说ZooKeeper quorum将被使用，那么相关的类（包）应该在客户端的类（classes）目录下，即客户端一定要找到文件hbase-site.xml。

二、搭建环境

新建JAVA项目，添加的包有:

有关Hadoop的hadoop-core-0.20.204.0.jar

有关Hbase的hbase-0.90.4.jar、hbase-0.90.4-tests.jar以及Hbase资源包中lib目录下的所有jar包

三、主要程序

package com.wujintao.hbase.test;import java.io.IOException;import java.util.ArrayList;import java.util.List;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.HColumnDescriptor;import org.apache.hadoop.hbase.HTableDescriptor;import org.apache.hadoop.hbase.KeyValue;import org.apache.hadoop.hbase.MasterNotRunningException;import org.apache.hadoop.hbase.ZooKeeperConnectionException;import org.apache.hadoop.hbase.client.Delete;import org.apache.hadoop.hbase.client.Get;import org.apache.hadoop.hbase.client.HBaseAdmin;import org.apache.hadoop.hbase.client.HTable;import org.apache.hadoop.hbase.client.HTablePool;import org.apache.hadoop.hbase.client.Put;import org.apache.hadoop.hbase.client.Result;import org.apache.hadoop.hbase.client.ResultScanner;import org.apache.hadoop.hbase.client.Scan;import org.apache.hadoop.hbase.filter.Filter;import org.apache.hadoop.hbase.filter.FilterList;import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;import org.apache.hadoop.hbase.util.Bytes;public class JinTaoTest {public static Configuration configuration;static {configuration = HBaseConfiguration.create();configuration.set("hbase.zookeeper.property.clientPort", "2181");configuration.set("hbase.zookeeper.quorum", "192.168.1.100");configuration.set("hbase.master", "192.168.1.100:600000");}public static void main(String[] args) {// createTable("wujintao");// insertData("wujintao");// QueryAll("wujintao");// QueryByCondition1("wujintao");// QueryByCondition2("wujintao");//QueryByCondition3("wujintao");//deleteRow("wujintao","abcdef");deleteByCondition("wujintao","abcdef");}/** * 创建表 * @param tableName */public static void createTable(String tableName) {System.out.println("start create table ......");try {HBaseAdmin hBaseAdmin = new HBaseAdmin(configuration);if (hBaseAdmin.tableExists(tableName)) {// 如果存在要创建的表，那么先删除，再创建hBaseAdmin.disableTable(tableName);hBaseAdmin.deleteTable(tableName);System.out.println(tableName + " is exist,detele....");}HTableDescriptor tableDescriptor = new HTableDescriptor(tableName);tableDescriptor.addFamily(new HColumnDescriptor("column1"));tableDescriptor.addFamily(new HColumnDescriptor("column2"));tableDescriptor.addFamily(new HColumnDescriptor("column3"));hBaseAdmin.createTable(tableDescriptor);} catch (MasterNotRunningException e) {e.printStackTrace();} catch (ZooKeeperConnectionException e) {e.printStackTrace();} catch (IOException e) {e.printStackTrace();}System.out.println("end create table ......");}/** * 插入数据 * @param tableName */public static void insertData(String tableName) {System.out.println("start insert data ......");HTablePool pool = new HTablePool(configuration, 1000);HTable table = (HTable) pool.getTable(tableName);Put put = new Put("112233bbbcccc".getBytes());// 一个PUT代表一行数据，再NEW一个PUT表示第二行数据,每行一个唯一的ROWKEY，此处rowkey为put构造方法中传入的值put.add("column1".getBytes(), null, "aaa".getBytes());// 本行数据的第一列put.add("column2".getBytes(), null, "bbb".getBytes());// 本行数据的第三列put.add("column3".getBytes(), null, "ccc".getBytes());// 本行数据的第三列try {table.put(put);} catch (IOException e) {e.printStackTrace();}System.out.println("end insert data ......");}/** * 删除一张表 * @param tableName */public static void dropTable(String tableName) {try {HBaseAdmin admin = new HBaseAdmin(configuration);admin.disableTable(tableName);admin.deleteTable(tableName);} catch (MasterNotRunningException e) {e.printStackTrace();} catch (ZooKeeperConnectionException e) {e.printStackTrace();} catch (IOException e) {e.printStackTrace();}}/** * 根据 rowkey删除一条记录 * @param tablename * @param rowkey */ public static void deleteRow(String tablename, String rowkey)  {try {HTable table = new HTable(configuration, tablename);List list = new ArrayList();Delete d1 = new Delete(rowkey.getBytes());list.add(d1);table.delete(list);System.out.println("删除行成功!");} catch (IOException e) {e.printStackTrace();}} /**  * 组合条件删除  * @param tablename  * @param rowkey  */ public static void deleteByCondition(String tablename, String rowkey)  {//目前还没有发现有效的API能够实现 根据非rowkey的条件删除 这个功能能，还有清空表全部数据的API操作}/** * 查询所有数据 * @param tableName */public static void QueryAll(String tableName) {HTablePool pool = new HTablePool(configuration, 1000);HTable table = (HTable) pool.getTable(tableName);try {ResultScanner rs = table.getScanner(new Scan());for (Result r : rs) {System.out.println("获得到rowkey:" + new String(r.getRow()));for (KeyValue keyValue : r.raw()) {System.out.println("列：" + new String(keyValue.getFamily())+ "====值:" + new String(keyValue.getValue()));}}} catch (IOException e) {e.printStackTrace();}}/** * 单条件查询,根据rowkey查询唯一一条记录 * @param tableName */public static void QueryByCondition1(String tableName) {HTablePool pool = new HTablePool(configuration, 1000);HTable table = (HTable) pool.getTable(tableName);try {Get scan = new Get("abcdef".getBytes());// 根据rowkey查询Result r = table.get(scan);System.out.println("获得到rowkey:" + new String(r.getRow()));for (KeyValue keyValue : r.raw()) {System.out.println("列：" + new String(keyValue.getFamily())+ "====值:" + new String(keyValue.getValue()));}} catch (IOException e) {e.printStackTrace();}}/** * 单条件按查询，查询多条记录 * @param tableName */public static void QueryByCondition2(String tableName) {try {HTablePool pool = new HTablePool(configuration, 1000);HTable table = (HTable) pool.getTable(tableName);Filter filter = new SingleColumnValueFilter(Bytes.toBytes("column1"), null, CompareOp.EQUAL, Bytes.toBytes("aaa")); // 当列column1的值为aaa时进行查询Scan s = new Scan();s.setFilter(filter);ResultScanner rs = table.getScanner(s);for (Result r : rs) {System.out.println("获得到rowkey:" + new String(r.getRow()));for (KeyValue keyValue : r.raw()) {System.out.println("列：" + new String(keyValue.getFamily())+ "====值:" + new String(keyValue.getValue()));}}} catch (Exception e) {e.printStackTrace();}}/** * 组合条件查询 * @param tableName */public static void QueryByCondition3(String tableName) {try {HTablePool pool = new HTablePool(configuration, 1000);HTable table = (HTable) pool.getTable(tableName);List<Filter> filters = new ArrayList<Filter>();Filter filter1 = new SingleColumnValueFilter(Bytes.toBytes("column1"), null, CompareOp.EQUAL, Bytes.toBytes("aaa"));filters.add(filter1);Filter filter2 = new SingleColumnValueFilter(Bytes.toBytes("column2"), null, CompareOp.EQUAL, Bytes.toBytes("bbb"));filters.add(filter2);Filter filter3 = new SingleColumnValueFilter(Bytes.toBytes("column3"), null, CompareOp.EQUAL, Bytes.toBytes("ccc"));filters.add(filter3);FilterList filterList1 = new FilterList(filters);Scan scan = new Scan();scan.setFilter(filterList1);ResultScanner rs = table.getScanner(scan);for (Result r : rs) {System.out.println("获得到rowkey:" + new String(r.getRow()));for (KeyValue keyValue : r.raw()) {System.out.println("列：" + new String(keyValue.getFamily())+ "====值:" + new String(keyValue.getValue()));}}rs.close();} catch (Exception e) {e.printStackTrace();}}}

注意：可能大家没看到更新数据的操作，其实更新的操作跟添加完全一致，只不过是添加呢rowkey不存在，更新呢rowkey已经存在，并且timstamp相同的情况下，还有就是目前好像还没办法实现hbase数据的分页查询，不知道有没有人知道怎么做。

四、HBase性能优化建议：

针对前面的代码，有很多不足之处，在此我就不修改上面的代码了，只是提出建议的地方，大家自己加上

1)配置

当你调用create方法时将会加载两个配置文件:hbase-default.xml and hbase-site.xml,利用的是当前的java类路径，代码中configuration设置的这些配置将会覆盖hbase-default.xml和hbase-site.xml中相同的配置,如果两个配置文件都存在并且都设置好了相应参上面的属性下面的属性即可

2)关于建表

public void createTable(HTableDescriptor desc)

HTableDescriptor 代表的是表的schema, 提供的方法中比较有用的有

setMaxFileSize，指定最大的region size

setMemStoreFlushSize 指定memstore flush到HDFS上的文件大小

增加family通过 addFamily方法

public void addFamily(final HColumnDescriptor family)

HColumnDescriptor代表的是column的schema，提供的方法比较常用的有

setTimeToLive:指定最大的TTL,单位是ms,过期数据会被自动删除。

setInMemory:指定是否放在内存中，对小表有用，可用于提高效率。默认关闭

setBloomFilter:指定是否使用BloomFilter,可提高随机查询效率。默认关闭

setCompressionType:设定数据压缩类型。默认无压缩。

setMaxVersions:指定数据最大保存的版本个数。默认为3。

注意的是，一般我们不去setInMemory为true,默认是关闭的

3)关于入库

官方建议

table.setAutoFlush(false); //数据入库之前先设置此项为false

table.setflushCommits();//入库完成后，手动刷入数据

注意：

在入库过程中，put.setWriteToWAL(true/flase);

关于这一项如果不希望大量数据在存储过程中丢失，建议设置为true,如果仅是在测试演练阶段，为了节省入库时间建议设置为false

4)关于获取表实例

HTablePool pool = new HTablePool(configuration, Integer.MAX_VALUE);

HTable table = (HTable) pool.getTable(tableName);

建议用表连接池的方式获取表，具体池有什么作用，我想用过数据库连接池的同学都知道，我就不再重复

不建议使用new HTable(configuration,tableName);的方式获取表

5)关于查询

建议每个查询语句都放入try catch语句块，并且finally中要进行关闭ResultScanner实例以及将不使用的表重新放入到HTablePool中的操作，具体做法如下

public static void QueryAll(String tableName) {HTablePool pool = new HTablePool(configuration, Integer.MAX_VALUE);HTable table = null;ResultScanner rs = null;try {Scan scan = new Scan();table = (HTable) pool.getTable(tableName);rs = table.getScanner(scan);for (Result r : rs) {System.out.println("获得到rowkey:" + new String(r.getRow()));for (KeyValue keyValue : r.raw()) {System.out.println("列：" + new String(keyValue.getFamily())+ "====值:" + new String(keyValue.getValue()));}}} catch (IOException e) {e.printStackTrace();}finally{rs.close();// 最后还得关闭pool.putTable(table); //实际应用过程中，pool获取实例的方式应该抽取为单例模式的，不应在每个方法都重新获取一次(单例明白？就是抽取到专门获取pool的逻辑类中，具体逻辑为如果pool存在着直接使用，如果不存在则new)}}

五、执行jar包

程序编译为一个jar包hbtest.jar

source ~/.bash_profile

export HADOOP_CLASSPATH=/home/admin/hadoop/hadoop-core-0.20.204.0.jar:/home/admin/hbase/hbase-0.90.4.jar:/home/admin/zookeeper/zookeeper-3.3.2.jar

hadoop jar hbtest.jar

0 0