HBase Region split 策略
来源:互联网 发布:淘宝的吕官方旗舰店 编辑:程序博客网 时间:2024/05/14 04:28
本文是基于hbase-0.98.6-cdh5.2.0
Region split policy
HBase的region split策略一共有以下几种:
- IncreasingToUpperBoundRegionSplitPolicy
- ConstantSizeRegionSplitPolicy
- DisabledRegionSplitPolicy
- KeyPrefixRegionSplitPolicy
- DelimitedKeyPrefixRegionSplitPolicy
IncreasingToUpperBoundRegionSplitPolicy
直接查看源码IncreasingToUpperBoundRegionSplitPolicy.java头部声明
Split size is the number of regions that are on this server that all are
of the same table, cubed, times 2x the region flush size OR the maximum
region split size, whichever is smaller. For example, if the flush size
is 128M, then after two flushes (256MB) we will split which will make two regions
that will split when their size is 2^3 * 128M * 2 = 2048M. If one of these
regions splits, then there are three regions and now the split size is
3^3 * 128M * 2 = 6912M, and so on until we reach the configured
maximum filesize and then from there on out, we’ll use that.region split的计算公式是:regioncount^3 * 128M * 2,当region达到该size的时候进行split
但是在该类内部的getSizeToCheck方法更直接的体现了region进行split的size
/** * @return Region max size or <code>count of regions squared * flushsize, which ever is * smaller; guard against there being zero regions on this server. */ protected long getSizeToCheck(final int tableRegionsCount) { // safety check for 100 to avoid numerical overflow in extreme cases return tableRegionsCount == 0 || tableRegionsCount > 100 ? getDesiredMaxFileSize(): Math.min(getDesiredMaxFileSize(), this.initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount); }
从这个方法来看,最终是当region达到以下size的时候进行split
Math.min(getDesiredMaxFileSize(), this.initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount)// getDesiredMaxFileSize() 这个值是hbase.hregion.max.filesize参数值,10GB// this.initialSize值为2 * hbase.hregion.memstore.flush.size,256MB// 最终是取Math.min(10G, 256 * regioncount^3)
第一次split:1^3 * 256 = 256MB
第二次split:2^3 * 256 = 2048MB
第三次split:3^3 * 256 = 6912MB
第四次split:4^3 * 256 = 16384MB > 10GB,因此取较小的值10GB
后面每次split的size都是10GB了
ConstantSizeRegionSplitPolicy
This is the default split policy. From 0.94.0 on the default split policy has changed to {@link IncreasingToUpperBoundRegionSplitPolicy}
0.94.0之前该策略是region的默认split策略,0.94.0之后region的默认split策略为IncreasingToUpperBoundRegionSplitPolicy,当region size达到hbase.hregion.max.filesize(默认10G)配置的大小后进行split。
DisabledRegionSplitPolicy
直接查看源码DisabledRegionSplitPolicy.java头部声明
This should be used with care, since it will disable automatic sharding.
该策略是直接禁用了region的自动split。
KeyPrefixRegionSplitPolicy
直接查看源码KeyPrefixRegionSplitPolicy.java头部声明
A custom RegionSplitPolicy implementing a SplitPolicy that groups rows by a prefix of the row-key
根据rowKey的前缀对数据进行分组,这里是指定rowKey的前多少位作为前缀,比如rowKey都是16位的,指定前5位是前缀,那么前5位相同的rowKey在进行region split的时候会分到相同的region中。
DelimitedKeyPrefixRegionSplitPolicy
直接查看源码DelimitedKeyPrefixRegionSplitPolicy.java头部声明
A custom RegionSplitPolicy implementing a SplitPolicy that groups rows by a prefix of the row-key with a delimiter. Only the first delimiter for the row key will define the prefix of the row key that is used for grouping.This ensures that a region is not split “inside” a prefix of a row key.
I.e. rows can be co-located in a region by their prefix.
As an example, if you have row keys delimited with _ , like userid_eventtype_eventid, and use prefix delimiter _, this split policy ensures that all rows starting with the same userid, belongs to the same region.保证相同前缀的数据在同一个region中,例如rowKey的格式为:userid_eventtype_eventid,指定的delimiter为 _ ,则split的的时候会确保userid相同的数据在同一个region中。
- hbase region split策略
- HBase Region split 策略
- HBASE-region的SPLIT策略
- hbase Region split policy 分区 分裂策略 算法
- hbase region 手动 split
- hbase region split 源码分析
- HBase源码分析 -- HBase Region 拆分(split)
- HBase笔记:Region拆分策略
- hbase split策略
- hbase 手动split region的patch
- Hbase region split源代码阅读笔记
- HBase中region split的大致流程
- Hbase split region代码阅读笔记
- hbase 持续写导致 无法split region
- HBase 1.1.2 split 策略
- HBase 0.94中的Split策略
- Hbase写入量大导致region过大无法split问题
- Hbase写入量大导致region过大无法split问题
- 字符串扩展 SDUT 1916
- 高效MacBook工作环境配置
- 变化多端 – 多种纯CSS的HTML表格设计
- mongodb个人研究
- 【Qt OpenGL教程】10:加载3D世界,并在其中漫游
- HBase Region split 策略
- wifi万能钥匙下载 v3.2.36 官方手机版
- MySQL基本语句
- select函数分析
- 数组下标为偶数,ASCII为偶数的字符重新输出,其他删除
- jni 使用java 调用C代码
- LeetCode(86) Partition List
- HDU 2571 命运(DP)
- p2098 求sky数