hbase的rowkey设计要点（官方文档介绍）

来源：互联网发布：互联网数据开发是什么编辑：程序博客网时间：2024/04/30 13:03

官方文档说明

http://hbase.apache.org/book.html#rowkey.design

一、Hotspotting（热点效应）

1、hbase是字典排序，这是一种优化扫描的方式，它允许你去把相关的行存储在一起，如果设计rowkey不当会产生严重热点效应

2、性能下降：当一大波数据流向同一个节点时，热点效应即发生，导致该区域不可使用

3、均匀分布数据到region是很重要的

4、为了防止热点效应，你需要设计rowkey使得数据散列到不同region上

解决的技术：

(一)散列技术

1、给rowkey添加随机产生的前缀，使得排序不同，则存储到不同的region上

2、散列可以使得数据跨多个regionserver而写入负载

举例：

有以下数据要写入，如果不应用随机盐，那么同一个region将会有4倍的写操作吞吐量

foo0001foo0002foo0003foo0004

应用随机盐：a b c d，使得同一时间的同一地区降低4倍的吞吐量

a-foo0003b-foo0001c-foo0004d-foo0002

然后，如果你添加其他行，它会随机分配四个可能的盐值中的一个，并最终接近现有行之一。

a-foo0003b-foo0001c-foo0003c-foo0004d-foo0002

3、缺点：由于随机性提高了吞吐量，但是检索时会提高成本

(二)hashing算法技术

1、该方法替代随机分配的，具备随机盐的优点，降低了检索成本

2、举例：

可以使上面的rowkey--->foo0003 类似的数据分布到相似的region中，这样检索就更快

(三)反转rowkey

这种方式不建议使用，因为他虽然能达到随机分配的目的，但是牺牲了rowkey字典排序的属性

二、Monotonically Increasing Row Keys/Timeseries Data

在单个区域中的堆积带来由单调增加通过随机输入记录到不能在排序顺序键可以减轻，

但一般来说，最好避免使用时间戳或序列（例如，1，2，3）作为行密钥。

三、Try to minimize row and column sizes（尽量减少行和列的大小）

过大的rowkey会导致占用storefile的大量空间，使得数据膨胀量至少提高数10亿倍，这是实际应用中不想看见的

（1） Column Families（列簇尽可能地小，而且不超过3个）

Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).

（2） Attributes（虽然可读性提高了，但是更短的attribute更利于hbase存储）

Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via") to store in HBase.

（3） Rowkey Length(rowkey越短，使用get绝对比scan性能好)

Keep them as short as is reasonable such that they can still be useful for required data access (e.g. Get vs. Scan). A short key that is useless for data access is not better than a longer key with better get/scan properties. Expect tradeoffs when designing rowkeys.

（4） Byte Patterns（存储方式是字节存储，这种方式很好）

A long is 8 bytes. You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes. If you stored this number as a String — presuming a byte per character — you need nearly 3x the bytes.

Not convinced? Below is some sample code that you can run on your own.

// longlong l = 1234567890L;byte[] lb = Bytes.toBytes(l);System.out.println("long bytes length: " + lb.length);        // returns 8String s = String.valueOf(l);byte[] sb = Bytes.toBytes(s);System.out.println("long as string length: " + sb.length);    // returns 10// hash//MessageDigest md = MessageDigest.getInstance("MD5");byte[] digest = md.digest(Bytes.toBytes(s));System.out.println("md5 digest bytes length: " + digest.length);    // returns 16String sDigest = new String(digest);byte[] sbDigest = Bytes.toBytes(sDigest);System.out.println("md5 digest as string length: " + sbDigest.length);    // returns 26

Unfortunately, using a binary representation of a type will make your data harder to read outside of your code. For example, this is what you will see in the shell when you increment a value:

hbase(main):001:0> incr 't', 'r', 'f:q', 1COUNTER VALUE = 1hbase(main):002:0> get 't', 'r'COLUMN                                        CELL f:q                                          timestamp=1369163040570, value=\x00\x00\x00\x00\x00\x00\x00\x011 row(s) in 0.0310 seconds

The shell makes a best effort to print a string, and it this case it decided to just print the hex. The same will happen to your row keys inside the region names. It can be okay if you know what’s being stored, but it might also be unreadable if arbitrary data can be put in the same cells. This is the main trade-off.

四、 Reverse Timestamps（反转时间戳）

Reverse Scan API

HBASE-4811 implements an API to scan a table or a range within a table in reverse, reducing the need to optimize your schema for forward or reverse scanning. This feature is available in HBase 0.98 and later. Seehttps://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setReversed%28booleanfor more information.

A common problem in database processing is quickly finding the most recent version of a value. A technique using reverse timestamps as a part of the key can help greatly with a special case of this problem. Also found in the HBase chapter of Tom White’s book Hadoop: The Definitive Guide (O’Reilly), the technique involves appending (Long.MAX_VALUE - timestamp) to the end of any key, e.g. [key][reverse_timestamp].

The most recent value for [key] in a table can be found by performing a Scan for [key] and obtaining the first record. Since HBase keys are in sorted order, this key sorts before any older row-keys for [key] and thus is first.

This technique would be used instead of using Number of Versions where the intent is to hold onto all versions "forever" (or a very long time) and at the same time quickly obtain access to any other version by using the same Scan technique.

五、 Rowkeys and ColumnFamilies

Rowkeys are scoped to ColumnFamilies. Thus, the same rowkey could exist in each ColumnFamily that exists in a table without collision.

六、Immutability of Rowkeys

Rowkeys cannot be changed. The only way they can be "changed" in a table is if the row is deleted and then re-inserted. This is a fairly common question on the HBase dist-list so it pays to get the rowkeys right the first time (and/or before you’ve inserted a lot of data).

七、 Relationship Between RowKeys and Region Splits

If you pre-split your table, it is critical to understand how your rowkey will be distributed across the region boundaries. As an example of why this is important, consider the example of using displayable hex characters as the lead position of the key (e.g., "0000000000000000" to "ffffffffffffffff"). Running those key ranges through Bytes.split (which is the split strategy used when creating regions inAdmin.createTable(byte[] startKey, byte[] endKey, numRegions) for 10 regions will generate the following splits…

48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48                                // 054 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10                 // 661 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -68                 // =68 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -126  // D75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 72                                // K82 18 18 18 18 18 18 18 18 18 18 18 18 18 18 14                                // R88 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -44                 // X95 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -102                // _102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102                // f

(note: the lead byte is listed to the right as a comment.) Given that the first split is a '0' and the last split is an 'f', everything is great, right? Not so fast.

The problem is that all the data is going to pile up in the first 2 regions and the last region thus creating a "lumpy" (and possibly "hot") region problem. To understand why, refer to an ASCII Table. '0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte values (bytes 58 to 96) that will never appear in this keyspace because the only values are [0-9] and [a-f]. Thus, the middle regions will never be used. To make pre-splitting work with this example keyspace, a custom definition of splits (i.e., and not relying on the built-in split method) is required.

Lesson #1: Pre-splitting tables is generally a best practice, but you need to pre-split them in such a way that all the regions are accessible in the keyspace. While this example demonstrated the problem with a hex-key keyspace, the same problem can happen with any keyspace. Know your data.

Lesson #2: While generally not advisable, using hex-keys (and more generally, displayable data) can still work with pre-split tables as long as all the created regions are accessible in the keyspace.

To conclude this example, the following is an example of how appropriate splits can be pre-created for hex-keys:.

public static boolean createTable(Admin admin, HTableDescriptor table, byte[][] splits)throws IOException {  try {    admin.createTable( table, splits );    return true;  } catch (TableExistsException e) {    logger.info("table " + table.getNameAsString() + " already exists");    // the table already exists...    return false;  }}public static byte[][] getHexSplits(String startKey, String endKey, int numRegions) {  byte[][] splits = new byte[numRegions-1][];  BigInteger lowestKey = new BigInteger(startKey, 16);  BigInteger highestKey = new BigInteger(endKey, 16);  BigInteger range = highestKey.subtract(lowestKey);  BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions));  lowestKey = lowestKey.add(regionIncrement);  for(int i=0; i < numRegions-1;i++) {    BigInteger key = lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));    byte[] b = String.format("%016x", key).getBytes();    splits[i] = b;  }  return splits;}

0 0