虚拟机里在ubuntu linux上搭建Eclipse的HBase 开发环境

来源：互联网发布：掌盟互娱java游戏破解编辑：程序博客网时间：2024/05/01 23:58

要求： Hadoop/HBase集群运行在远程的数据中心；开发环境搭建在本地的虚机的ubuntu 系统里

1. 虚拟机，比如VMware，virtual PC or ....，里建立ubuntu linux

2. 下载Eclipse JEE Version， Helios是个不错的考虑，这里强烈建议用JEE version，实际开发后你就会发现省事不少；

3. Hadoop开发环境配置

4. HBase环境配置

4.1 新建project

4.2 将以下 hbase, hadoop, log4j, commons-logging,commons-lang, and ZooKeeper jars 包放入classpath。一般是在project中新建一个lib文件，与src文件平级，将以上文件拷贝进去，再添加Java Build Path。

比如：

hadoop-0.20.205-core.jar

log4j-1.2.16.jar

commons-logging-1.1.1.jar

hbase-0.90.4.jar

hbase-0.90.4-tests.jar

zookeeper-3.2.2.jar

commons-lang-2.5.jar

4.3 在project中新建一个conf文件，与src文件平级，用于存放hbase的conf文件夹内容，然后通过add class folder添加Java Build Path。

当然，另外一种选择是将4.2的lib和4.3的conf同时放进reference library里，然后将此reference library add进入build path，这样整体看起来干净整洁。方法是先建立在preference里建立user library，然后通过工程->properties-java build path->libraries-> add library 导入；

看图如下：

第一张图为了说明问题，重复了两种方法，根据方框的宽度区分；

第二张图来自一战友的文章： http://www.sujee.net/tech/articles/hbase-map-reduce-freq-counter/，哥们把conf都放入library里了。好坏由你点评了。

另外这里顺便说一下， Order and Export 这里的顺序是很重要的，但往往被忽略。 Eclipse 在类名相同的类进行导入提示时，就是根据这个顺序，所以当导入的库比较多时，花一分钟思考一下库的顺序是必要的。尤其是像我一样，很喜欢用Ctrl+Shift+O的同志。

order就是使用s同名class的顺序；export就是把用到的一些的lib和project同时发布.

Say you have junit.jar in the build path of project A. Project B depends on project A.

Now you write a junit test in project B. If project A exports junit.jar, project B can use it at compile time - no more action necessary. If A doesn't export it, B doesn't know about it - you will have to explicitely put it into its build path, too.

4.4 建立HBase操作类，运行成功的话，可以在HBase上建立相关表。控制台有输出信息供分析。

package com.ibm.bi.hbase;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.HColumnDescriptor;import org.apache.hadoop.hbase.HTableDescriptor;import org.apache.hadoop.hbase.client.Get;import org.apache.hadoop.hbase.client.HBaseAdmin;import org.apache.hadoop.hbase.client.HTable;import org.apache.hadoop.hbase.client.Put;import org.apache.hadoop.hbase.client.Result;import org.apache.hadoop.hbase.client.ResultScanner;import org.apache.hadoop.hbase.client.Scan;import org.apache.hadoop.hbase.util.Bytes;public class TableOperation {/** * @param args */public static void main(String[] args) throws Exception {                   Configuration config = HBaseConfiguration.create();          // Create table     HBaseAdmin admin = new HBaseAdmin(config);     HTableDescriptor htd = new HTableDescriptor("test");     HColumnDescriptor hcd = new HColumnDescriptor("data");     htd.addFamily(hcd);     admin.createTable(htd);     byte [] tablename = htd.getName();     HTableDescriptor [] tables = admin.listTables();     if (tables.length != 1 && Bytes.equals(tablename, tables[0].getName())) {     throw new IOException("Failed create of table");     }          // Run some operations -- a put, a get, and a scan -- against the table.     HTable table = new HTable(config, tablename);     byte [] row1 = Bytes.toBytes("row1");     Put p1 = new Put(row1);     byte [] databytes = Bytes.toBytes("data");     p1.add(databytes, Bytes.toBytes("1"), Bytes.toBytes("value1"));     table.put(p1);     Get g = new Get(row1);     Result result = table.get(g);     System.out.println("Get: " + result);     Scan scan = new Scan();     ResultScanner scanner = table.getScanner(scan);     try {     for (Result scannerResult: scanner) {     System.out.println("Scan: " + scannerResult);     }     } finally {     scanner.close();     }     //     // Drop the table//     admin.disableTable(tablename);//     admin.deleteTable(tablename);     }}

4.5 问题分析

(1) Error类似以下，

ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information.

这个是表象，仔细分析控制台的log输出，发现是连接超时的问题。于是转而确认虚机是否能够与HBase master连接的问题。通过分析发现是域名连接的问题，主要是远程服务器是双网卡的，导致Hadoop及HBase配置文件中的域名是外部网络无法连通的。于是修改（1）远程服务器的hostname，让hostname改为对外一致的hostname。这是个麻烦而且公司政治风险高的操作——老板和系统管理员一定会challenge你为什么这么做，想好对策吧！ 2）添加本地/etc/hosts 中的域名，IP映射，使得连接成功。

（2）