hadoop 自学指南九之HBase

来源：互联网发布：ps for mac破解版编辑：程序博客网时间：2024/06/08 19:10

一、前言

Hbase 是Apache hadoop 的数据库，能够提供随机、实时的读写访问，具有开源，分布式，可扩展性及面向列存储特点。

特性如下：及模块的可扩展性，一致性读写，可配置的表自动分割策略，RegionServer 自动故障恢复，便利地备份MapReduce 作业的基类，便于客户端访问的javaAPI

为实时查询提供块缓存和Bloom Filter，可通过服务器端进行查询下推预测，提供了支持xml、Protobuf及二进制编码的Thrift网管和Rest-ful网络服务，可扩展JIRB Shell

支持hadoop 或JMX将度量标准倒出到文件或Ganglia

二、Hbase 安装

下面主要介绍完全分布式安装

1、修改con/hbase-site.xml

2、修改conf/regionservers配置

3、Zookeeper配置

修改conf/hbase-env.sh

export HBASE_MANAGES_ZK=true 表示使用 HBase 将把zooKeeper作为自身的一部分运行，其对应的进程为“HQuorumPeer"

export HBASE_MANAGES_ZK=false 表示使用必须手动运行hbase.zookeeper.quorum其对应的进程为“HQuorumMain"

启动Hbase

start-hbse.sh

可能遇到的问题

http://blog.csdn.net/ice_grey/article/details/48756893

三、HBase Shell

主要有以下：

alter

count

describe

delete

deleteall

disable

drop

enable

exists

exit

get

incr

list

put

tools

scan

status

shutdown

truncate

version

四、Hbase 体系

HBase 服务器遵从简单的主从结构，由HRegin Server群和HBase Master 服务器构成。HMaser 服务器管理所有的HRegion 服务器。Hbase 所有服务

是通过zoomkeeper来进行协调的。对用户来说，每一个表是一堆数据的集合，靠主键来区分。一张表被拆分成了多少块，每一块就是一个Region.我们用表名+开始/结束主键来区分每一个HRegion.HRegion 服务器由两个部分组成HLOG和HRegion 部分。每个HRegion 又由许多Store组成，每个Sore存储的其实就是一个列族，每个Store 包含许多StoreFile,StoreFile负责的是实际的数据存储。

HMaster 由ZooKeeper协调保证有一个HMaster运行

HMaster 负责Table和HRegion 的管理工作

五、Hbase Region

HRegion 都有一个“RegionId"来标识它的唯一性，不同的HRegion由tablename+startKey+regionId。

元数据META表保存的就是HRegion标识符和实际HRegion服务器的映射关系。

根数据表Root Table保存的元数据的存放位置

六、HBase API

数据库

HBaseAdmin

eg: HbaseAdmin admin = new HbaseAdmin(config);

admin.disableTable("tablename");

HBaseConfiguration

eg: Configuration config = new HBaseConfiguration.create();

表

HTable

eg: HTable table = new Htable(conf,Bytes.toBytes(tablename));

ResultScanner scanner = table.getScanner(Bytes.toBytes("cf"));

HTableDescriptor

eg: HtableDescriptor htd = new HtableDescriptor(name);

htd.addFamily(new HcolumnDescriptor("Family"));

列族

HColumnDescriptor

eg: HTableDescriptor htd = new HTableDescriptro(tablename);

HColumnDescriptor col = new HColumnDescriptor("content");

htd.addFamily(col);

行列操作

Put

HTable table = new HTable(conf,Bytes.toBytes(tablename));

Put p = new Put(row)

p.add(family,qualifier,value);

table.put(p);

Get

Htable table = new HTable(conf,Bytes.toBytes（tablename))

Get g = new Get(Bytes.toBytes(row))

Result result = table.get(g);

Scanner

1、api例子：

package hadoop.v12;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.HColumnDescriptor;import org.apache.hadoop.hbase.HTableDescriptor;import org.apache.hadoop.hbase.client.Get;import org.apache.hadoop.hbase.client.HBaseAdmin;import org.apache.hadoop.hbase.client.HTable;import org.apache.hadoop.hbase.client.Put;import org.apache.hadoop.hbase.client.Result;import org.apache.hadoop.hbase.client.ResultScanner;import org.apache.hadoop.hbase.client.Scan;import org.apache.hadoop.hbase.util.Bytes;public class HBaseTestCase {               //声明静态配置 HBaseConfiguration    static Configuration cfg=HBaseConfiguration.create();    //创建一张表，通过HBaseAdmin HTableDescriptor来创建    public static void creat(String tablename,String columnFamily) throws Exception {        HBaseAdmin admin = new HBaseAdmin(cfg);        if (admin.tableExists(tablename)) {            System.out.println("table Exists!");            System.exit(0);        }        else{            HTableDescriptor tableDesc = new HTableDescriptor(tablename);            tableDesc.addFamily(new HColumnDescriptor(columnFamily));            admin.createTable(tableDesc);            System.out.println("create table success!");        }    }      //添加一条数据，通过HTable Put为已经存在的表来添加数据    public static void put(String tablename,String row, String columnFamily,String column,String data) throws Exception {        HTable table = new HTable(cfg, tablename);        Put p1=new Put(Bytes.toBytes(row));        p1.add(Bytes.toBytes(columnFamily), Bytes.toBytes(column), Bytes.toBytes(data));        table.put(p1);        System.out.println("put '"+row+"','"+columnFamily+":"+column+"','"+data+"'");    }      public static void get(String tablename,String row) throws IOException{            HTable table=new HTable(cfg,tablename);            Get g=new Get(Bytes.toBytes(row));                Result result=table.get(g);                System.out.println("Get: "+result);    }    //显示所有数据，通过HTable Scan来获取已有表的信息    public static void scan(String tablename) throws Exception{         HTable table = new HTable(cfg, tablename);         Scan s = new Scan();         ResultScanner rs = table.getScanner(s);         for(Result r:rs){             System.out.println("Scan: "+r);         }    }        public static boolean delete(String tablename) throws IOException{                        HBaseAdmin admin=new HBaseAdmin(cfg);            if(admin.tableExists(tablename)){                    try                    {                            admin.disableTable(tablename);                            admin.deleteTable(tablename);                    }catch(Exception ex){                            ex.printStackTrace();                            return false;                    }                                }            return true;    }      public static void  main (String [] agrs) {            String tablename="hbase_tb";        String columnFamily="cf";                      try {                                 HBaseTestCase.creat(tablename, columnFamily);            HBaseTestCase.put(tablename, "row1", columnFamily, "cl1", "data");            HBaseTestCase.get(tablename, "row1");            HBaseTestCase.scan(tablename);/*           if(true==HBaseTestCase.delete(tablename))                    System.out.println("Delete table:"+tablename+"success!");*/                   }        catch (Exception e) {            e.printStackTrace();        }    }}

2、Hbase 和WoudCount结合

package hadoop.v12;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.HColumnDescriptor;import org.apache.hadoop.hbase.HTableDescriptor;import org.apache.hadoop.hbase.client.HBaseAdmin;import org.apache.hadoop.hbase.client.Put;import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;import org.apache.hadoop.hbase.mapreduce.TableReducer;import org.apache.hadoop.hbase.util.Bytes;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;public class WordCountHBase{public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>{private IntWritable i = new IntWritable(1);public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException{String s[] =value.toString().trim().split(" ");//将输入的每行输入以空格分开for( String m : s){context.write(new Text(m), i);}}}public static class Reduce extendsTableReducer<Text, IntWritable,NullWritable>{public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,InterruptedException{int sum = 0;for(IntWritable i : values){sum += i.get();}Put put = new Put(Bytes.toBytes(key.toString()));//Put实例化，每一个词存一行put.add(Bytes.toBytes("content"),Bytes.toBytes("count"),Bytes.toBytes(String.valueOf(sum)));//列族为content，列修饰符为count，列值为数目context.write(NullWritable.get(), put);}}public static void createHBaseTable(String tablename)throws IOException{HTableDescriptor htd = newHTableDescriptor(tablename);HColumnDescriptor col = newHColumnDescriptor("content");htd.addFamily(col);HBaseConfiguration config = newHBaseConfiguration();HBaseAdmin admin = new HBaseAdmin(config);if(admin.tableExists(tablename)){System.out.println("table exists, trying recreate table! ");admin.disableTable(tablename);admin.deleteTable(tablename);}System.out.println("create new table: " + tablename);admin.createTable(htd);}public static void main(String args[]) throws Exception{String tablename = "wordcount";Configuration conf = new Configuration();conf.set(TableOutputFormat.OUTPUT_TABLE, tablename);createHBaseTable(tablename);String input = args[0];//设置输入值Job job = new Job(conf, "WordCount table with " + input);job.setJarByClass(WordCountHBase.class);job.setNumReduceTasks(3);job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);job.setInputFormatClass(TextInputFormat.class);job.setOutputFormatClass(TableOutputFormat.class);FileInputFormat.addInputPath(job, new Path(input));System.exit(job.waitForCompletion(true)?0:1);}}

参巧：hadoop 实战第二版

0 0