HBase Coprocessor 之 endpiont(hbase 0.96.0)

来源:互联网 发布:软件图标设计 编辑:程序博客网 时间:2024/05/21 00:17

本文是基于hbase 0.96.0 测试的,理论上支持hbase 0.94 以上版本!!

HBase有两种协处理器(Coprocessor)

1、RegionObserver :类似于关系型数据库的触发器

2、Endpoint:类似于关系型数据库的存储过程,本文将介绍此种Coprocessor.

Endpoint 允许您定义自己的动态RPC协议,用于客户端与region servers通讯。Coprocessor 与region server在相同的进程空间中,因此您可以在region端定义自己的方法(endpoint),将计算放到region端,减少网络开销,常用于提升hbase的功能,如:count,sum等。


本文以count为例,实现一个自己的endpoint:

一、定义一个protocol buffer Service。

1、安装protobuf

下载protoc-2.5.0-win32.zip(根据自己的操作系统选择),解压;

将protoc-2.5.0-win32中的protoc.exe拷贝到c:\windows\system32中。

将proto.exe文件拷贝到解压后的XXX\protobuf-2.5.0\src目录中.

参考链接:http://shuofenglxy.iteye.com/blog/1512980

2.定义.proto文件,用于定义类的一些基本信息

CXKTest.proto的代码如下:

option java_package = "com.cxk.coprocessor.test.generated";option java_outer_classname = "CXKTestProtos";option java_generic_services = true;option java_generate_equals_and_hash = true;option optimize_for = SPEED;message CountRequest {}message CountResponse {  required int64 count = 1 [default = 0];}service RowCountService {  rpc getRowCount(CountRequest)    returns (CountResponse);}

参考链接:https://developers.google.com/protocol-buffers/docs/proto#services

3.用proto.exe 生成java代码
执行命令:proto.exe--java_out=. CXKTest.proto

在 com.cxk.coprocessor.test.generated 下会生成类:CXKTestProtos

二、定义自己的Endpoint类(实现一下自己的方法)

RowCountEndpoint.java 的代码片段如下:

package com.cxk.coprocessor.test;import java.io.IOException;import java.util.ArrayList;import java.util.List;import org.apache.hadoop.hbase.Cell;import org.apache.hadoop.hbase.CellUtil;import org.apache.hadoop.hbase.Coprocessor;import org.apache.hadoop.hbase.CoprocessorEnvironment;import org.apache.hadoop.hbase.client.Scan;import org.apache.hadoop.hbase.coprocessor.CoprocessorException;import org.apache.hadoop.hbase.coprocessor.CoprocessorService;import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;import org.apache.hadoop.hbase.protobuf.ResponseConverter;import org.apache.hadoop.hbase.regionserver.InternalScanner;import org.apache.hadoop.hbase.util.Bytes;import com.google.protobuf.RpcCallback;import com.google.protobuf.RpcController;import com.google.protobuf.Service;public class RowCountEndpoint extends CXKTestProtos.RowCountService    implements Coprocessor, CoprocessorService {  private RegionCoprocessorEnvironment env;  public RowCountEndpoint() {  }  @Override  public Service getService() {    return this;  }  /**   * 统计hbase表总行数   */  @Override  public void getRowCount(RpcController controller, CXKTestProtos.CountRequest request,                          RpcCallback<CXKTestProtos.CountResponse> done) {    Scan scan = new Scan();    scan.setFilter(new FirstKeyOnlyFilter());    CXKTestProtos.CountResponse response = null;    InternalScanner scanner = null;    try {      scanner = env.getRegion().getScanner(scan);      List<Cell> results = new ArrayList<Cell>();      boolean hasMore = false;      byte[] lastRow = null;      long count = 0;      do {        hasMore = scanner.next(results);        for (Cell kv : results) {          byte[] currentRow = CellUtil.cloneRow(kv);          if (lastRow == null || !Bytes.equals(lastRow, currentRow)) {            lastRow = currentRow;            count++;          }        }        results.clear();      } while (hasMore);      response = CXKTestProtos.CountResponse.newBuilder()          .setCount(count).build();    } catch (IOException ioe) {      ResponseConverter.setControllerException(controller, ioe);    } finally {      if (scanner != null) {        try {          scanner.close();        } catch (IOException ignored) {}      }    }    done.run(response);  }  @Override  public void start(CoprocessorEnvironment env) throws IOException {    if (env instanceof RegionCoprocessorEnvironment) {      this.env = (RegionCoprocessorEnvironment)env;    } else {      throw new CoprocessorException("Must be loaded on a table region!");    }  }  @Override  public void stop(CoprocessorEnvironment env) throws IOException {    // nothing to do  }}

三、实现自己的客户端方法:

TestEndPoint.java 代码如下:

package com.test;import java.io.IOException;import java.util.Map;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.client.HTable;import org.apache.hadoop.hbase.client.coprocessor.Batch;import org.apache.hadoop.hbase.ipc.BlockingRpcCallback;import org.apache.hadoop.hbase.ipc.ServerRpcController;import com.cxk.coprocessor.test.CXKTestProtos;import com.cxk.coprocessor.test.CXKTestProtos.RowCountService;import com.google.protobuf.ServiceException;public class TestEndPoint {/** *  * @param args[0] ip ,args[1] zk_ip,args[2] table_name * @throws ServiceException * @throws Throwable */public static void main(String[] args) throws ServiceException, Throwable {// TODO Auto-generated method stub         System.out.println("begin.....");         long begin_time=System.currentTimeMillis();Configuration config=HBaseConfiguration.create();//String master_ip="192.168.150.128";String master_ip=args[0];String zk_ip=args[1];String table_name=args[2];config.set("hbase.zookeeper.property.clientPort", "2181"); config.set("hbase.zookeeper.quorum", zk_ip); config.set("hbase.master", master_ip+":600000");    final CXKTestProtos.CountRequest request = CXKTestProtos.CountRequest.getDefaultInstance();    HTable table=new HTable(config,table_name);        Map<byte[],Long> results = table.coprocessorService(RowCountService.class,        null, null,        new Batch.Call<CXKTestProtos.RowCountService,Long>() {          public Long call(CXKTestProtos.RowCountService counter) throws IOException {            ServerRpcController controller = new ServerRpcController();            BlockingRpcCallback<CXKTestProtos.CountResponse> rpcCallback =                new BlockingRpcCallback<CXKTestProtos.CountResponse>();            counter.getRowCount(controller, request, rpcCallback);            CXKTestProtos.CountResponse response = rpcCallback.get();            if (controller.failedOnException()) {              throw controller.getFailedOn();            }            return (response != null && response.hasCount()) ? response.getCount() : 0;          }        });    table.close();     if(results.size()>0){     System.out.println(results.values());     }else{     System.out.println("没有任何返回结果");     }     long end_time=System.currentTimeMillis();     System.out.println("end:"+(end_time-begin_time));}}

四、部署endpoint

部署endpoint有两种方法,第一种通过修改hbase.site.xml文件,实现对所有表加载这个endpoint;第二张通过alter表,实现对某一张表加载这个endpoint;

1、修改hbase.site.xml

在hbase.site.xml中添加如下内容

<property>    <name>hbase.coprocessor.region.classes</name>    <value>com.cxk.coprocessor.test.RowCountEndpoint</value>    <description>A comma-separated list of Coprocessors that are loaded by    default. For any override coprocessor method from RegionObservor or    Coprocessor, these classes' implementation will be called    in order. After implement your own    Coprocessor, just put it in HBase's classpath and add the fully    qualified class name here.    </description>  </property>

2、hbase shell alter表

A、将CXKTestProtos.java和RowCountEndpoint.java打成jar放到hdfs上;

B、

disable 'test'

C、

alter 'test','coprocessor'=>'hdfs:///user/hadoop/test/coprocessor/cxkcoprocessor.1.01.jar|com.cxk.coprocessor.test.RowCountEndpoint|1001|arg1=1,arg2=2'
D、
enable 'test'

五、运行客户端

将TestEndPoint.java 打成jar,通过以下命令运行

java -jar test.cxk.endpiont.1.03.jar ip1 ip2 test

ps:如果eclipse可以直接调试hadoop,可直接运行测试类。


参考材料:

http://hbase.apache.org/devapidocs/index.html





0 0