HBase Region split 策略

来源:互联网 发布:淘宝的吕官方旗舰店 编辑:程序博客网 时间:2024/05/14 04:28

本文是基于hbase-0.98.6-cdh5.2.0

Region split policy


HBase的region split策略一共有以下几种:

  • IncreasingToUpperBoundRegionSplitPolicy
  • ConstantSizeRegionSplitPolicy
  • DisabledRegionSplitPolicy
  • KeyPrefixRegionSplitPolicy
  • DelimitedKeyPrefixRegionSplitPolicy

这里写图片描述


IncreasingToUpperBoundRegionSplitPolicy


直接查看源码IncreasingToUpperBoundRegionSplitPolicy.java头部声明

Split size is the number of regions that are on this server that all are
of the same table, cubed, times 2x the region flush size OR the maximum
region split size, whichever is smaller. For example, if the flush size
is 128M, then after two flushes (256MB) we will split which will make two regions
that will split when their size is 2^3 * 128M * 2 = 2048M. If one of these
regions splits, then there are three regions and now the split size is
3^3 * 128M * 2 = 6912M, and so on until we reach the configured
maximum filesize and then from there on out, we’ll use that.

region split的计算公式是:regioncount^3 * 128M * 2,当region达到该size的时候进行split

但是在该类内部的getSizeToCheck方法更直接的体现了region进行split的size

  /**   * @return Region max size or <code>count of regions squared * flushsize, which ever is   * smaller; guard against there being zero regions on this server.   */  protected long getSizeToCheck(final int tableRegionsCount) {    // safety check for 100 to avoid numerical overflow in extreme cases    return tableRegionsCount == 0 || tableRegionsCount > 100 ? getDesiredMaxFileSize():      Math.min(getDesiredMaxFileSize(),        this.initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount);  }

从这个方法来看,最终是当region达到以下size的时候进行split

 Math.min(getDesiredMaxFileSize(), this.initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount)// getDesiredMaxFileSize() 这个值是hbase.hregion.max.filesize参数值,10GB// this.initialSize值为2 * hbase.hregion.memstore.flush.size256MB// 最终是取Math.min(10G, 256 * regioncount^3)

第一次split:1^3 * 256 = 256MB
第二次split:2^3 * 256 = 2048MB
第三次split:3^3 * 256 = 6912MB
第四次split:4^3 * 256 = 16384MB > 10GB,因此取较小的值10GB
后面每次split的size都是10GB了


ConstantSizeRegionSplitPolicy


This is the default split policy. From 0.94.0 on the default split policy has changed to {@link IncreasingToUpperBoundRegionSplitPolicy}

0.94.0之前该策略是region的默认split策略,0.94.0之后region的默认split策略为IncreasingToUpperBoundRegionSplitPolicy,当region size达到hbase.hregion.max.filesize(默认10G)配置的大小后进行split。


DisabledRegionSplitPolicy


直接查看源码DisabledRegionSplitPolicy.java头部声明

This should be used with care, since it will disable automatic sharding.

该策略是直接禁用了region的自动split。


KeyPrefixRegionSplitPolicy


直接查看源码KeyPrefixRegionSplitPolicy.java头部声明

A custom RegionSplitPolicy implementing a SplitPolicy that groups rows by a prefix of the row-key

根据rowKey的前缀对数据进行分组,这里是指定rowKey的前多少位作为前缀,比如rowKey都是16位的,指定前5位是前缀,那么前5位相同的rowKey在进行region split的时候会分到相同的region中。


DelimitedKeyPrefixRegionSplitPolicy


直接查看源码DelimitedKeyPrefixRegionSplitPolicy.java头部声明

A custom RegionSplitPolicy implementing a SplitPolicy that groups rows by a prefix of the row-key with a delimiter. Only the first delimiter for the row key will define the prefix of the row key that is used for grouping.This ensures that a region is not split “inside” a prefix of a row key.
I.e. rows can be co-located in a region by their prefix.
As an example, if you have row keys delimited with _ , like userid_eventtype_eventid, and use prefix delimiter _, this split policy ensures that all rows starting with the same userid, belongs to the same region.

保证相同前缀的数据在同一个region中,例如rowKey的格式为:userid_eventtype_eventid,指定的delimiter为 _ ,则split的的时候会确保userid相同的数据在同一个region中。

0 0