二 HBase过滤器

来源:互联网 发布:矩阵的迹怎么求 编辑:程序博客网 时间:2024/05/22 10:56

概述:

HBase过滤器提供了非常强大的特性来帮助用户提高其处理表中数据的效率。HBase中主要读取数据的函数是get()和scan(),它们都支持直接访问数据和通过指定起止行建访问数据的功能。我们可以通过过滤器添加各种限制条件来减少查询得到的数据量,提高查询效率。

1 比较过滤器

1.1 简介

HBase提供的第一种过滤器实现就是比较过滤器,用户需要提供比较运算符和比较类来让过滤器工作。

1.1.1 比较运算符


1.1.2 比较器


值得注意的是,后面三种比较器只能与EQUAL,NOT_EQUAL搭配使用。因为很显然他们的返回值是0和。与其他比较运算符搭配会出错

1.2 行过滤器(RowFilter)

使用行健来过滤数据,构造函数原型。其中第一个参数为比较运算符,例如 CompareOp.EQUAL。第二个参数为比较器,例如:
new RegexStringComparator(".*.01")。
/**   * Constructor.   * @param rowCompareOp the compare op for row matching   * @param rowComparator the comparator for row matching   */  public RowFilter(final CompareOp rowCompareOp,      final ByteArrayComparable rowComparator) {    super(rowCompareOp, rowComparator);  }

1.3 列族过滤器(FamilyFilter)

与行过滤器类似,不过它是用来过滤特定的列族来返回结果。同样第一个参数为比较运算符,第二个参数为比较器。用户可以在列族一级筛选结果
/**   * Constructor.   *   * @param familyCompareOp  the compare op for column family matching   * @param familyComparator the comparator for column family matching   */  public FamilyFilter(final CompareOp familyCompareOp,                      final ByteArrayComparable familyComparator) {      super(familyCompareOp, familyComparator);  }

1.4 列名过滤器(QualifierFilter)

通过过滤特定的列名来返回筛选结果。调用方式与前面相同
/**   * Constructor.   * @param op the compare op for column qualifier matching   * @param qualifierComparator the comparator for column qualifier matching   */  public QualifierFilter(final CompareOp op,      final ByteArrayComparable qualifierComparator) {    super(op, qualifierComparator);  }

1.5 值过滤器(ValueFilter)

可以帮助用户筛选特定值的单元格。与RegexStringComparator配合使用,可以使用强大的正则表达式来进行筛选。不过在使用特定比较器的时候只能与特定的比较运算符进行匹配。具体使用规则参见1.1。调用方式如下:
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator("S"));

1.6 参考列过滤器(DependentColumnFilter)

这是一种更为复杂的过滤器,不仅仅简单的通过用户指定的信息筛选数据。还允许用户指定一个参考列或引用列,并使用参考列控制其他列的过滤。参考列过滤器使用参考列的时间戳,并在过滤时包括所有与引用时间戳相同的列。它有三种构造方法。后面代码中会依次展示他们的调用效果
public DependentColumnFilter(final byte [] family, final byte [] qualifier,      final boolean dropDependentColumn)
public DependentColumnFilter(final byte [] family, final byte [] qualifier)
public DependentColumnFilter(final byte [] family, final byte[] qualifier,
      final boolean dropDependentColumn, final CompareOp valueCompareOp,
      final ByteArrayComparable valueComparator)

2 专用过滤器

HBase 提供的第二类过滤器直接继承自FilterBase,同时用于更特定的场景。

2.1 单列值过滤器(SingleColumnValueFilter)单列排除过滤器(SingleColumnValueExcludeFilter)

用一列的值决定一行是否被过滤,两种构造方式。
public SingleColumnValueFilter(final byte [] family, final byte [] qualifier,      final CompareOp compareOp, final byte[] value) 
public SingleColumnValueFilter(final byte [] family, final byte [] qualifier,      final CompareOp compareOp, final ByteArrayComparable comparator)
调用起来也比较简单。首先设定待检查的列,然后设定待检查列的值。这样所有所有不符合设定的行都将被过滤。例如:
new SingleColumnValueFilter(Bytes.toBytes("Info"), Bytes.toBytes("age"), CompareOp.LESS, Bytes.toBytes("32"));
所有 age 的值大于等于32的都将被过滤。
单列排除过滤器与单列值过滤器调用完全一样。唯一的不同就是后者的结果中永远不会包含作为过滤条件的列。

2.2 前缀过滤器(PrefixFilter)

行过滤器的一个特定场景吧,所有与前缀匹配的行都会被返回到客户端。构造函数如下
public PrefixFilter(final byte [] prefix)

2.3 其他专用过滤器

还有很多其他的专用过滤器,我也没有一一测试。现在把他们列出来,需要的时候可以尝试使用
分页过滤器(PageFilter)、行健过滤器(KeyOnlyFilter)、首次行健过滤器(FirstKeyOnlyFilter)、包含结束的过滤器(InclusiveStopFilter)、时间戳过滤器(TimestampsFilter)、列计数过滤器(ColumnCountGetFilter)、列分页过滤器(ColumnPaginationFilter)、列前缀过滤器(ColumnPrefixFilter)、随机行过滤器(RandomRowFilter)

3 附加过滤器(decorating filter)

目前HBase提供的过滤器已经十分强大,这些过滤器可以提供修改,扩展和对返回结果的行为进行控制等功能。一些额外的控制不依赖于过滤器本身,但却可以应用在其他过滤器上。这正是附加过滤器想要提供的功能。

3.1 跳转过滤器(SkipFilter)

这个过滤器包装了一个用户提供的过滤器,当被包装的过滤器遇到一个需要过滤的值的时候,用户可以拓展并过滤整行数据。换句话说,当用户发现某一列需要过滤时整行都会被过滤,构造函数如下。可见跳转过滤器的参数是一个过滤器
public SkipFilter(Filter filter)

3.2 全匹配过滤器(whileMatchFilter)

当一条数据被过滤掉是,直接放弃这次扫描。与SkipFilter使用类似

4 FilterList

可以将已经存在的filter添加到一个list中形成一个多级的过滤器,以达到组合查询的效果。
总共三种构造方式,我主要使用的是第一种。即传入一个过滤器列表进行组合过滤操作
/**   * Constructor that takes a set of {@link Filter}s. The default operator   * MUST_PASS_ALL is assumed.   *   * @param rowFilters list of filters   */  public FilterList(final List<Filter> rowFilters) {    if (rowFilters instanceof ArrayList) {      this.filters = rowFilters;    } else {      this.filters = new ArrayList<Filter>(rowFilters);    }  }  /**   * Constructor that takes a var arg number of {@link Filter}s. The fefault operator   * MUST_PASS_ALL is assumed.   * @param rowFilters   */  public FilterList(final Filter... rowFilters) {    this.filters = new ArrayList<Filter>(Arrays.asList(rowFilters));  }  /**   * Constructor that takes an operator.   *   * @param operator Operator to process filter set with.   */  public FilterList(final Operator operator) {    this.operator = operator;  }

5 测试代码

下面将以上描述的过滤器的一个简单实现的代码贴出来。稍稍调试便能知道他们的作用。直接复制代码到eclipse上即可使用
package com.cjmkt.hbase;import java.io.IOException;import java.util.ArrayList;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.Cell;import org.apache.hadoop.hbase.CellUtil;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.TableName;import org.apache.hadoop.hbase.client.Connection;import org.apache.hadoop.hbase.client.ConnectionFactory;import org.apache.hadoop.hbase.client.Get;import org.apache.hadoop.hbase.client.HTable;import org.apache.hadoop.hbase.client.Result;import org.apache.hadoop.hbase.client.ResultScanner;import org.apache.hadoop.hbase.client.Scan;import org.apache.hadoop.hbase.filter.BinaryComparator;import org.apache.hadoop.hbase.filter.ColumnPaginationFilter;import org.apache.hadoop.hbase.filter.ColumnPrefixFilter;import org.apache.hadoop.hbase.filter.CompareFilter;import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;import org.apache.hadoop.hbase.filter.DependentColumnFilter;import org.apache.hadoop.hbase.filter.FamilyFilter;import org.apache.hadoop.hbase.filter.InclusiveStopFilter;import org.apache.hadoop.hbase.filter.PageFilter;import org.apache.hadoop.hbase.filter.PrefixFilter;import org.apache.hadoop.hbase.filter.QualifierFilter;import org.apache.hadoop.hbase.filter.RandomRowFilter;import org.apache.hadoop.hbase.filter.RegexStringComparator;import org.apache.hadoop.hbase.filter.RowFilter;import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;import org.apache.hadoop.hbase.filter.SubstringComparator;import org.apache.hadoop.hbase.filter.TimestampsFilter;import org.apache.hadoop.hbase.filter.ValueFilter;import org.apache.hadoop.hbase.util.Bytes;public class HBaseFilter {public static void main(String[] args) throws Exception{// 设置本地hadoop的安装路径。安装了MapReduce插件的话就不需要这一行代码了,插件会帮我们设置好System.setProperty("hadoop.home.dir", "E:\\jdk\\hadoop2.6.0\\");// 创建conf对象Configuration conf = HBaseConfiguration.create();valueFilter(conf);}// 随机行过滤器public static void randomRowFilter(Configuration conf) throws Exception{RandomRowFilter randomRowFilter = new RandomRowFilter(0.00001f);Scan scan = new Scan();scan.setFilter(randomRowFilter);scan.setStartRow(Bytes.toBytes("rowkey00000000"));scan.setStopRow(Bytes.toBytes("rowkey01000000"));HTable table = getTable(conf, "test_info");ResultScanner scanner = table.getScanner(scan);int count = 0;for(Result res : scanner){System.out.println(res);count++;}System.out.println("total row = " + count);}// 列前缀过滤器。通过对列名进行前缀匹配过滤public static void columnPrefixFilter(Configuration conf) throws Exception{ColumnPrefixFilter columnPrefixFilter = new ColumnPrefixFilter(Bytes.toBytes("na"));Scan scan = new Scan();scan.setFilter(columnPrefixFilter);HTable table = getTable(conf, "HBase_Table");ResultScanner scanner = table.getScanner(scan);for(Result res : scanner){System.out.println(res);}}// 列分页过滤器,对一行中的所有列分页public static void columnPaginationFilter(Configuration conf) throws Exception{ColumnPaginationFilter columnPaginationFilter = new ColumnPaginationFilter(1, 2);Scan scan = new Scan();scan.setFilter(columnPaginationFilter);HTable table = getTable(conf, "HBase_Table");ResultScanner scanner = table.getScanner(scan);for(Result res : scanner){System.out.println(res);}scanner.close();}public static void timestampFilter(Configuration conf) throws Exception{ArrayList<Long> arrayList = new ArrayList<Long>();arrayList.add(1479193994109L);TimestampsFilter timestampsFilter = new TimestampsFilter(arrayList);Scan scan = new Scan();scan.setFilter(timestampsFilter);// 设定时间范围限制,避免全表扫描。提高效率scan.setTimeRange(1479193994109L, 1479194994209L);HTable table = getTable(conf, "test_info");ResultScanner scanner = table.getScanner(scan);for(Result res : scanner){System.out.println(res);}}// 包含结束的过滤器,感觉跟设置起止行差不多public static void inclusiveStopFilter(Configuration conf) throws Exception{InclusiveStopFilter inclusiveStopFilter = new InclusiveStopFilter(Bytes.toBytes("rowkey00000999"));Scan scan = new Scan();scan.setStartRow(Bytes.toBytes("rowkey00000001"));scan.setFilter(inclusiveStopFilter);HTable table = getTable(conf, "test_info");ResultScanner scanner = table.getScanner(scan);for(Result res : scanner){System.out.println(res);}}// 分页过滤器public static void pageFilter(Configuration conf) throws Exception{PageFilter pageFilter = new PageFilter(90);int totalRows = 0;Scan scan = new Scan();scan.setFilter(pageFilter);scan.setStartRow(Bytes.toBytes("rowkey09000001"));scan.setStopRow(Bytes.toBytes("rowkey10000000"));HTable table = getTable(conf, "test_info");ResultScanner scanner = table.getScanner(scan);for(Result res : scanner){totalRows++;System.out.println(res);}System.out.println(totalRows);scanner.close();}// 前缀过滤器,与前缀匹配的行都会返回.通过与起始行配合使用效率大大提高public static void prefixFilter(Configuration conf) throws Exception{PrefixFilter prefixFilter = new PrefixFilter(Bytes.toBytes("rowkey0900009"));Scan scan = new Scan();scan.setStartRow(Bytes.toBytes("rowkey09000001"));scan.setStopRow(Bytes.toBytes("rowkey09100000"));scan.setFilter(prefixFilter);HTable table = getTable(conf, "test_info");ResultScanner scanner = table.getScanner(scan);for(Result res : scanner){System.out.println(res);}scanner.close();System.out.println("--------------------------------");Get get = new Get(Bytes.toBytes("rowkey09000091"));get.setFilter(prefixFilter);Result result = table.get(get);System.out.println(result);}// 单列值过滤器,单列排除过滤器public static void singleColumnValueFilter(Configuration conf) throws Exception{HTable table = getTable(conf, "HBase_Table");// 单列排除过滤器//SingleColumnValueExcludeFilter singleColumnValueExcludeFilter = new SingleColumnValueExcludeFilter(Bytes.toBytes("Info"), Bytes.toBytes("age"), CompareOp.GREATER, Bytes.toBytes("30"));// 单列值过滤器SingleColumnValueFilter singleColumnValueExcludeFilter = new SingleColumnValueFilter(Bytes.toBytes("Info"), Bytes.toBytes("age"), CompareOp.LESS, Bytes.toBytes("32"));singleColumnValueExcludeFilter.setFilterIfMissing(false);Scan scan = new Scan();scan.setFilter(singleColumnValueExcludeFilter);ResultScanner scanner = table.getScanner(scan);for(Result res : scanner){System.out.println(res);}scanner.close();System.out.println("-------------------------------------------------");Get get = new Get(Bytes.toBytes("rowkey0002"));get.setFilter(singleColumnValueExcludeFilter);Result result = table.get(get);for(Cell cell : result.rawCells()){System.out.println(Bytes.toString(CellUtil.cloneRow(cell)));System.out.println(Bytes.toString(CellUtil.cloneFamily(cell)));System.out.println(Bytes.toString(CellUtil.cloneQualifier(cell)));System.out.println(Bytes.toString(CellUtil.cloneValue(cell)));}table.close();}// 参考列过滤器public static void dependentColumnFilter(Configuration conf) throws Exception{HTable table = getTable(conf, "HBase_Table");DependentColumnFilter dependentColumnFilter = new DependentColumnFilter(Bytes.toBytes("Info"), Bytes.toBytes("name"));Scan scan = new Scan();scan.setFilter(dependentColumnFilter);ResultScanner scanner = table.getScanner(scan);for(Result res : scanner){System.out.println(res);}scanner.close();System.out.println("-------------------------------------------------");Get get = new Get(Bytes.toBytes("rowkey0002"));get.setFilter(dependentColumnFilter);Result result = table.get(get);System.out.println(result);System.out.println("-------------------------------------------------");DependentColumnFilter dependentColumnFilter2 = new DependentColumnFilter(Bytes.toBytes("Info"), Bytes.toBytes("name"), true);scan.setFilter(dependentColumnFilter2);ResultScanner scanner2 = table.getScanner(scan);for(Result res : scanner2){System.out.println(res);}scanner2.close();System.out.println("-------------------------------------------------");RegexStringComparator binaryComparator = new RegexStringComparator(".*JS");DependentColumnFilter dependentColumnFilter3 = new DependentColumnFilter(Bytes.toBytes("Info"), Bytes.toBytes("name"), true, CompareOp.EQUAL, binaryComparator);scan.setFilter(dependentColumnFilter3);ResultScanner scanner3 = table.getScanner(scan);for(Result res : scanner3){System.out.println(res);}scanner3.close();table.close();}// 值过滤器,筛选某个特定值的单元格public static void valueFilter(Configuration conf) throws Exception{HTable table = getTable(conf, "HBase_Table");ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator("S"));Scan scan = new Scan();scan.setFilter(valueFilter);ResultScanner scanner = table.getScanner(scan);for(Result rs : scanner){System.out.println(rs);}scanner.close();System.out.println("--------------------------------------------");Get get = new Get(Bytes.toBytes("rowkey0001"));get.setFilter(valueFilter);Result result = table.get(get);for(Cell cell : result.rawCells()){System.out.println(Bytes.toString(CellUtil.cloneRow(cell)));System.out.println(Bytes.toString(CellUtil.cloneFamily(cell)));System.out.println(Bytes.toString(CellUtil.cloneQualifier(cell)));System.out.println(Bytes.toString(CellUtil.cloneValue(cell)));}table.close();}// 列名过滤器,帮助筛选特定的列public static void qualifierFilter(Configuration conf) throws Exception{HTable table = getTable(conf, "HBase_Table");QualifierFilter qualifierFilter = new QualifierFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("age")));Scan scan = new Scan();scan.setFilter(qualifierFilter);ResultScanner scanner = table.getScanner(scan);for(Result res : scanner){System.out.println(res);}scanner.close();System.out.println("---------------------------------------");Get get = new Get(Bytes.toBytes("rowkey0001"));get.setFilter(qualifierFilter);Result result = table.get(get);System.out.println(result);table.close();}// 列族过滤器,用来返回特定的列public static void familyFilter(Configuration conf){HTable table = getTable(conf, "HBase_Table");Scan scan = new Scan();FamilyFilter familyFilter = new FamilyFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("Info")));scan.setFilter(familyFilter);try {ResultScanner scanner = table.getScanner(scan);for(Result rs : scanner){System.out.println(rs);}scanner.close();} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}System.out.println("-------------------------------------------");try {Get get = new Get(Bytes.toBytes("rowkey0002"));get.setFilter(familyFilter);Result result = table.get(get);System.out.println("result = " + result);} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}System.out.println("-------------------------------------------");FamilyFilter familyFilter2 = new FamilyFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("Info")));Get get = new Get(Bytes.toBytes("rowkey0001"));get.addColumn(Bytes.toBytes("Info"), Bytes.toBytes("age"));get.setFilter(familyFilter2);try {Result result = table.get(get);System.out.println(result);} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}}// 行过滤器——使用行健来过滤特定的数据public static void rowFilter(Configuration conf){// 获取表HTable table = getTable(conf, "HBase_Table");Scan scan = new Scan();scan.addColumn(Bytes.toBytes("Info"), Bytes.toBytes("age"));// 行健过滤器~创建过滤器,指定比较运算符和比较器,需要精确匹配RowFilter rowFilter1 = new RowFilter(CompareFilter.CompareOp.GREATER, new BinaryComparator(Bytes.toBytes("rowkey0001")));scan.setFilter(rowFilter1);try {ResultScanner scanner = table.getScanner(scan);for(Result rs : scanner){System.out.println(rs);}scanner.close();} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}System.out.println("-------------------------------------------");// 创建另一个过滤器,使用正则表达式来匹配行健RowFilter rowFilter2 = new RowFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator(".*.01"));scan.setFilter(rowFilter2);try {ResultScanner scanner = table.getScanner(scan);for(Result rs : scanner){System.out.println(rs);}scanner.close();} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}System.out.println("-------------------------------------------");// 创建另一个过滤器,使用正则表达式来匹配行健RowFilter rowFilter3 = new RowFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator("rowkey"));scan.setFilter(rowFilter3);try {ResultScanner scanner = table.getScanner(scan);for(Result rs : scanner){System.out.println(rs);}scanner.close();} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}}/* * FilterList实现了Filter接口。它可以通过组合多个过滤器的功能来实现某种效果,从而代替提供这类效果的过滤器 */public static void filterList(Configuration conf) throws Exception{HTable table = getTable(conf, "test_info");ArrayList<Filter> filters = new ArrayList<Filter>();RowFilter rowFilter1 = new RowFilter(CompareOp.GREATER_OR_EQUAL, new BinaryComparator(Bytes.toBytes("rowkey00000001")));filters.add(rowFilter1);RowFilter rowFilter2 = new RowFilter(CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("rowkey10000000")));filters.add(rowFilter2);QualifierFilter qualifierFilter = new QualifierFilter(CompareOp.EQUAL, new RegexStringComparator("gendd"));filters.add(qualifierFilter);FilterList filterList = new FilterList(filters);Scan scan = new Scan();scan.setFilter(filterList);scan.setStartRow(Bytes.toBytes("rowkey00000011"));scan.setStopRow(Bytes.toBytes("rowkey00000020"));ResultScanner scanner = table.getScanner(scan);for(Result res : scanner){System.out.println(res);}scanner.close();System.out.println("---------------------------------------------------------");FilterList filterList2 = new FilterList(FilterList.Operator.MUST_PASS_ONE, filters);scan.setFilter(filterList2);ResultScanner scanner2 = table.getScanner(scan);for(Result res : scanner2){System.out.println(res);}scanner2.close();}/* * 全匹配过滤器,当一条数据被过滤掉时,他会直接放弃这次扫描操作 * 使用期封装的过滤器来检查KeyValue并确认是否有一行数据因为行健或列被跳过而过滤 */public static void whileMatchFilter(Configuration conf) throws Exception{RowFilter rowFilter = new RowFilter(CompareOp.NOT_EQUAL, new BinaryComparator(Bytes.toBytes("rowkey00000003")));HTable table = getTable(conf, "test_info");Scan scan = new Scan();scan.setStartRow(Bytes.toBytes("rowkey00000001"));scan.setStopRow(Bytes.toBytes("rowkey01000000"));scan.setFilter(rowFilter);ResultScanner scanner = table.getScanner(scan);//for(Result res : scanner){//System.out.println(res);//Thread.sleep(1000);//}scanner.close();System.out.println("---------------------------------------------");WhileMatchFilter whileMatchFilter = new WhileMatchFilter(rowFilter);scan.setFilter(whileMatchFilter);ResultScanner scanner2 = table.getScanner(scan);for(Result res : scanner2){System.out.println(res);Thread.sleep(1000);}scanner2.close();}/* * 被包装的过滤器必须实现filterKeyValue()方法,否则SkipFilter无法正常工作 * SkipFilter只通过检查这个方法的返回结果来决定如何处理这一行 */public static void skipFilter(Configuration conf) throws Exception{ValueFilter valueFilter = new ValueFilter(CompareOp.NOT_EQUAL, new BinaryComparator(Bytes.toBytes("JS")));Scan scan = new Scan();scan.setFilter(valueFilter);HTable table = getTable(conf, "HBase_Table");ResultScanner scanner1 = table.getScanner(scan);for(Result res : scanner1){System.out.println(res);}scanner1.close();System.out.println("----------------------------------------------");SkipFilter skipFilter = new SkipFilter(valueFilter);scan.setFilter(skipFilter);ResultScanner scanner2 = table.getScanner(scan);for(Result res : scanner2){System.out.println(res);}scanner2.close();}public static HTable getTable(Configuration conf, String tableName){Connection conn;try {conn = ConnectionFactory.createConnection(conf);HTable table = (HTable)conn.getTable(TableName.valueOf(tableName));return table;} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}return null;}}






0 0
原创粉丝点击