HBase学习笔记

来源：互联网发布：linux 原生镜像编辑：程序博客网时间：2024/05/16 09:10

首先说明：本文只是自己的一个备忘学习笔记，记录了一些我自己觉得重要的易忘的东西，内容非常简洁，如果需要完整深入的学习HBase，了解其背后的原理，那本文显然不合适的，请参考其它博客或者发布的指南、出版的书籍。

HBase是一个在HDFS上开发的面向列的分布式数据库。
如果需要实时随机访问超大规模的数据集，就可以使用HBase这一Hadoop应用。
HBase并不是关系型数据库，不支持SQL，但是在特定的问题下，它能做到关系型数据库不能做的事情，在廉价硬件构成的集群上管理超大规模的系数数据表。

实现

一个Master节点协调管理一个或多个Regionserver从属机。
HBase主控机（Master)负责启动（bootstrap）和全新的安装把区域分配给注册的Regionserver，恢复Regionserver的故障。

HBase依赖于ZooKeeper。

物理模型:

HBase是按照列存储的稀疏行/列矩阵，物理模型实际上就是把概念模型中的一个行进行分割，并按照列族存储，注意空值是不被存储到磁盘的。
这里写图片描述

Hbase Shell命令

HBase 选项列表：

Commands:Some commands take arguments. Pass no args or -h for usage.  shell           Run the HBase shell  hbck            Run the hbase 'fsck' tool  snapshot        Create a new snapshot of a table  snapshotinfo    Tool for dumping snapshot information  wal             Write-ahead-log analyzer  hfile           Store file analyzer  zkcli           Run the ZooKeeper shell  upgrade         Upgrade hbase  master          Run an HBase HMaster node  regionserver    Run an HBase HRegionServer node  zookeeper       Run a Zookeeper server  rest            Run an HBase REST server  thrift          Run the HBase Thrift server  thrift2         Run the HBase Thrift2 server  clean           Run the HBase clean up script  classpath       Dump hbase CLASSPATH  mapredcp        Dump CLASSPATH entries required by mapreduce  pe              Run PerformanceEvaluation  ltt             Run LoadTestTool  version         Print the version  CLASSNAME       Run the class named CLASSNAME

Shell 命令：(不像sql，不加分号;结尾 )
% hbase shell下的命令（HBase数据表的基本操作）：

创建表：create ‘table’,’data’
列出表： list
表结构：describe ‘table’
插入数据： put ‘table’,’row001’,’data:x’,’oakwood’
显示表的内容：scan ‘table’
删除表：disable ‘table’ ->drop ‘table’

Java客户端

和HBase集群进行交互，有多种不同的客户端可供选择。HBase和Hadoop一样，都是用Java开发的。

用Java API实现前面HBase-shell中表的基本管理和访问

import java.io.IOException;//hadoop.confimport org.apache.hadoop.conf.Configuration;//hbaseimport org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.HColumnDescriptor;import org.apache.hadoop.hbase.HTableDescriptor;import org.apache.hadoop.hbase.MasterNotRunningException;import org.apache.hadoop.hbase.ZooKeeperConnectionException;import org.apache.hadoop.hbase.client.*;//hbase.util.Bytesimport org.apache.hadoop.hbase.util.Bytes;public class ExampleClient {    public static void main(String[] args) throws MasterNotRunningException,             ZooKeeperConnectionException, IOException {        //import org.apache.hadoop.conf.Configuration;        byte[] tablename=Bytes.toBytes("mytable4");        byte[] columnfamily=Bytes.toBytes("data");        //create a Configuration        //这个类会返回一个读入了程序classpath下hbase-site.xml,hbase-         //default.xml文件中HBase配置信息的Configuration        //Configuration接下来就会被用于创建HBaseAdmin和HTable实例        Configuration conf=HBaseConfiguration.create();             HBaseAdmin admin=new HBaseAdmin(conf);        HTableDescriptor htd=new HTableDescriptor(tablename);        HColumnDescriptor hcd=new HColumnDescriptor(columnfamily);        htd.addFamily(hcd);        admin.createTable(htd);        //import org.apache.hadoop.hbase.util.Bytes;        //Bytes.equals(left,right) left,right:byte[]        HTableDescriptor[] tables=admin.listTables();        if(tables.length!=1 && Bytes.equals(tablename,tables[0].getName())){//or:tablename=htc.getName()            throw new IOException("Failed create of tables");        }        //对HTable进行一些操作        //首先要定义一个 HTable,然后利用HTable操作        HTable table=new HTable(conf, tablename);        //Put               byte[] key=Bytes.toBytes("key");//row-key        Put p=new Put(key);        byte[] column=Bytes.toBytes("c1");        byte[] value=Bytes.toBytes("value");        p.add(columnfamily,column,value);               table.put(p);                //Get        Get g=new Get(key);        Result result=table.get(g);        System.out.println(result);        //scan:HBase扫描器，类似传统数据库中的“游标”(cursor)        //和Java中的“迭代器”（iterator)                Scan scan=new Scan();        ResultScanner scanner=table.getScanner(scan);        try{            for(Result r:scanner){                System.out.println(r);            }        }finally{            scanner.close();        }        //drop table        admin.disableTable(tablename);        admin.deleteTable(tablename);        //close        table.close();        admin.close();    }}

通过Java客户端的编程，我们可以深刻体会到HBase的这样一些特性：
（1）行的键和单元格的内容都是字节数组，HBase大多数据结构都与字节数组有关，处理起来就不是很麻烦，利用byte[] tablename=Bytes.toBytes(“mytable”);这样的方法就能解决大部分数据结构的问题。
（2）一个表的列族作为表模式的一部分，必须事先给出。但是新的列族成员可以任何时候按需加入。例如有列族station，那么随时可以定义诸如station：airtemperature这样的成员，并添加值，只要前缀一致就可以了。HBase这种灵活的特性，且表可以很高（数十亿个数据行），可以很宽（数百万个列），水平分区并在并在上千个普通商业节点上自动复制，这从遵循模式固定的RDBMS来看，是无法想象的。

MapReduce

计算HBase表中行数的MapReduce的应用程序
//immutable–不可改变的

import org.apache.hadoop.hbase.mapreduce.public class RowCounter{  static final String NAME="rowcounter";static class RowCounterMapperextends TableMapper<ImmutableBytesWritable,Result>{  public static enum Counters{ROWS}  @Override  public void map(ImumutableBytesWritable row,Result Vlaues,    Context context) throws IOException{      for(KeyValue value:values.list()){        if(value.getValue().length>0){          contex.getCounter(Counters.ROWS).increment(1);          break;        }      }    }}public static job createSubmittableJob(configuration conf,   String[] args) throws IOException{     String tableName=args[0];     Job job =new Job(conf,Name "_"+tableName);     job.setJarByClass(RowCounter.class);     StringBuilder sb=new StringBuilder();     final int columoffset =1;     for(int i=columoffset;i<args.length;i++){       if(i>columnoffset){         sb.append("  ");                }       sb.append(args[i])     }     Scan scan=new Scan();     scan.setFilter(new FirstKeyOnlyFilter));     if(sb.length()>0){       for(String columnName:sb.toString().split("  ")){         String[] fields =columnName.split(":");         if(fields.length==1){           scan.addFamily(Bytes.toBytes(fields[0]));                    }else{           scan.addColum(ytes.toBytes(fields[0],           Bytes.toBytes(fields[1]));         }       }     }     job.setOutputFormateClass(NullOutputFrmate. class);     TableMapReduceUtil.initTableMapperJob(tableName.scan,     RowCOunterMapper.class,ImmutableBytesWritable.class,       Result.class,job);     job.setNumReduceTasks(0);     return job;  }public static void main(String[] args)throws Exception{  Configuration conf=HBaseConfiguration.create();  GenericOptionsParser(conf.args).    getReamianingArgs();    if(otherArgs.length<1){      System.err.println("ERROR Wrong number of parameters:"      +args.length);      System.err.println("Uages:RowCounter      <tablename>[<column1><column2>...]");      System.exit(-1);          }    Job job=createSubmittableJob(conf,otherArgs);    System.exit(job.waitForCompletion(true)?0:1); }

实例：气象数据的例子
创建表：
表1.Stations:观测站数据
行键：stationid
列族：info:name,
info:loaction,
info:description;
将stationid作为键

表2.Observations:气温观测数据
行键：stationid
列族：data:airtemp
使用组合键（把观测的时间戳放在键之后）同一观测站的数据被分在一组，使用逆序的时间戳的二进制存储（Long.Max_VALUE-epoch)，这样每个观测站观测数据中最新的数据存储在最前面。
在Shell中，如下：

>create 'stations',{NAME=>'info',VERSION=>1}>create 'observations',{NAME=>'data',VERSION=>1}>

在两个表中，我们都只对表单元格的最新版本感兴趣，所以VERSION设为1，这个参数的默认值为3.

加载数据
对于表2，将原始数据先复制到HDFS，接着运行MapReduce作业写入到HBase。

//从HDFS向HBase表导入气温数据的MapReducepublic class HBaseTemperatureImporter   extend Configred implemnts Tool{  //Inner class for amp  static class HBaseTemperatureMapper<K,V>extends     MapReduceBase Implements Mapper<LongWritable,Text,K,V>{      private NcdcRecordParser parser         =new NcdcrecordParser();      private HTable table;  public void map(LongWritable key,Text value,    outputCollector<K,V> output, Reporter reporter)    throws IOException{      parser.parser(value.toString());      if(parser.isValideTemperatur()){        byte[] rowKey=RowKeyConverter.makeObservationRowKey        .makeObservationRowKey(parser.getStationId(),        parser,getObservationData().getTime());        Put p=new put(rowKey);        p.add(HBaseTemperatureCli.DATA_COLUMNFAMILY,          HBaseTemperatureCli.AIRTEMP_QUALIFIER,          Bytes.toBytes(parser.getAIrTemperature()));          table.put(p);      }    }    public void configure(JobConf jc){      super.configure(jc);      try{        this.table=new HTable(new HBaseConfiguration(jc),          "observations");                }catch(IOException e){        throw new RuntimeException        ("Failed HTable construction",e);              }    }    @Override    public void close() throws IOException{      super.close();      table.close();    }    public int run(String[] args_ throws IOException{      if(args.length!=1){       System.err.println("Usage:HBaseTemperature<input>");       return -1;      }      jonConf jc=new JobConf(getConf(),getClass());      FileInputFormate.addInputPath(jc,new Path(args[0]));      jc.setMapperClass(HBaseTemperatureMapper.class);      jc.setNumReduceTask(0);      jc.setOutputFormate(NullOutputFormate.class):      jobClient.runJob(jc);      return 0;    }    pubilc static void main(String[] args)throws Exception{      int exitCode=ToolRunner..run(new HBaseConfiguration(),        new  HBaseTemperatureImporter(0,args);      System.exit(exitCode);    }}public class RowKeyConverter{  private static final int STATION_ID_LENGTH = 12;  //return A row Key whose formate is:  //<station_id><reverse_order_epoch>  public static byte[] makeObservationRowKey    (String stationId,longobservationTime{      byte[]row =       newbyte[STATION_ID_LENGTH+Bytes.SIZEOF_LONG];      Bytes.putBytes      (row,0,Bytes.toBytes(stationId),1,STATION_ID_LENGTH);      return row;        }  }

遇到的一些错误及解决办法

（1）hbase shell 刚开始使用时遇到的问题：**
hbase(main):001:0> create ‘table’,’t1’
2016-09-24 11:09:43,434 ERROR [main] client.ConnectionManager % HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in ‘zookeeper.znode.parent’. There could be a mismatch with the one configured in the master.

查看hbase的日志里有这个Could not start ZK at requested port of 2181. ZK was started at port:2182. Aborting as clients(e.g. shell) will not be able to find this ZK quorum。
找到2181端口对应的进程，把那个进程kill掉，就可以启动了。这个进程是java的，你可以用命令 lsof -i:2181找到进程。
至于为什么有一个java会占用2181端口，可能是启动了：zkServer.sh start，我试过，问题可以复现，zkServer.sh stop 或者kill就好了。

（2）本地文件系统管理一个HBase 实例出现的错误（汗。。。）
%hbase(main):010:0* create ‘t’,’t2’
2016-09-24 13:59:13,748 ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts
2016-09-24 13:59:13,749 WARN [main] zookeeper.ZKUtil: hconnection-0x2e8690980x0, quorum=localhost:2181, baseZNode=/hbase Unable to set watcher on znode (/hbase)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
错误解决：直接敲hbase也可以进入shell，但是在创建表的时候就会遇到上面的错误。
如果只是简单的学习hbase，启动一个本地的文件系统/tmp目录作为持久化存储HBase临时实例：
一定要先
% start-hbase.sh
这时候用jps查看，多了一个 11367 HMaster(HBase主进程）
然后再% hbase shell创建表就正常了
用完后记得 %stop-hbase.sh

参考文献：
（1） HBase的这些配置参数你都懂了吗？ macyang博客
http://blog.csdn.net/macyang/article/details/6211141

0 0