HBase Merging Regions

来源：互联网发布：返利网和淘宝客编辑：程序博客网时间：2024/05/20 03:06

我承认我之前不知道hbase还能做merge region操作，而且它适合在什么情况下用呢，下面的这篇文章给出了一些结论：

有的时候region个数太多不是什么好事情，所以merge region大势所趋啦～　

While it is much more common for regions to split automatically over time as you are adding data to the corresponding table, there might be situations where you need to merge regions, for example, after you have removed a large amount of data and you want to reduce the number of regions hosted by each server.

HBase ships with a tool that allows you to merge two adjacent regions as long as the cluster is not online. You can use the command line tool to get the usage details:

$ ./bin/hbase org.apache.hadoop.hbase.util.MergeUsage: bin/hbase merge <table-name> <region-1> <region-2>

Here is an example of a table that has more than one region, which are then subsequently merged:

$ ./bin/hbase shellhbase(main):001:0> create 'testtable', 'colfam1', \ {SPLITS => ['row-10','row-20','row-30','row-40','row-50']}0 row(s) in 0.2640 secondshbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do \ put 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end0 row(s) in 1.0450 secondshbase(main):003:0> flush 'testtable'0 row(s) in 0.2000 secondshbase(main):004:0> scan '.META.', { COLUMNS => ['info:regioninfo']}ROW                                  COLUMN+CELL testtable,,1309614509037.612d1e0112 column=info:regioninfo, timestamp=130... 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY => 'row-10' testtable,row-10,1309614509040.2fba column=info:regioninfo, timestamp=130... fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10', ENDKEY => 'row-20' testtable,row-20,1309614509041.e7c1 column=info:regioninfo, timestamp=130... 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20', ENDKEY => 'row-30' testtable,row-30,1309614509041.a9cd column=info:regioninfo, timestamp=130... e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30', ENDKEY => 'row-40' testtable,row-40,1309614509041.d458 column=info:regioninfo, timestamp=130... 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40', ENDKEY => 'row-50' testtable,row-50,1309614509041.74a5 column=info:regioninfo, timestamp=130... 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50', ENDKEY => ''6 row(s) in 0.0440 secondshbase(main):005:0> exit$ ./bin/stop-hbase.sh$ ./bin/hbase org.apache.hadoop.hbase.util.Merge testtable \ testtable,row-20,1309614509041.e7c16267eb30e147e5d988c63d40f982. \ testtable,row-30,1309614509041.a9cde1cbc7d1a21b1aca2ac7fda30ad8.

The example creates a table with five split points, resulting in six regions. It then inserts some rows and flushes the data to ensure that there are store files for the subsequent merge. The scan is used to get the names of the regions, but you can also use the web UI of the master: click on the table name in the User Tables section to get the same list of regions.

Note

Note how the shell wraps the values in each column. The region name is split over two lines, which you need to copy&paste separately. The web UI is easier to use in that respect as it has the names in one column and in a single line.

The content of the column values is abbreviated to the start and end keys. You can see how the create command using the split keys has created the regions. The example goes on to exit the shell, and stop the HBase cluster. Note that HDFS still needs to run for the merge to work as it needs to read the store files of each region and merge them into a new combined one.