Hadoop HBase-0.20.0 Performance Evaluation

来源：互联网发布：广东网络报警电话号码编辑：程序博客网时间：2024/06/14 01:02

HBase-0.20.0 Performance Evaluation

By Cloudeep: Anty and Schubert, August 18, 2009

We have been using HBase for around a year in our development and projects, from 0.17.x to 0.19.x. We and all of the community know the serious Performance/Throughput issue of these releases.

Now, the great news is that hbase-0.20.0 will be released soon. Jonathan Gray from Streamy, Ryan Rawson from StumbleUpon and Jean-Daniel Cryans had done a great job to rewrite many codes to enhance the performance. The two presentations [1][2] provide more details of this release.

Following items are very important for us:
- Insert performance: data generated fast.
- Scan performance: for data analysis by MapReduce.
- Random Access performance.
- The HFile (same as SSTable)
- Less memory and I/O overheads

Bellow is our evaluations on hbase-0.20.0 RC1:

Cluster:
- 5 slaves + 1 master
- Slaves (1-4): 4 CPU cores(2.0G), 800GB SATA disks, 8GB RAM. Slave(5): 8 CPU cores(2.0G) 6 disks with RAID1, 4GB RAM
- 1Gbps network, all nodes under the same switch.
- Hadoop-0.20.0 (1GB heap), HBase-0.20.0 (2GB heap), Zookeeper-3.2.0

We modified the org.apache.hadoop.hbase.PerformanceEvaluation since the code have following problems:
- Is not match for hadoop-0.20.0.
- The approach to split map is not strict. Need to provide correct InputSplit and InputFormat classes.

The evaluation programs use MapReduce to do parallel operations against HBase table.
- Total rows: 5,242,850.
- Row size: 1000 bytes for value, and 10 bytes for rowkey.
- Sequential ranges: 50. (also used to define the total number of MapTasks in each evaluation)
- Each Sequential Range rows: 104,857

The principle is same as the evaluation programs described in Section 7, Performance Evaluation, of the Google Bigtable paper[3], pages 8-10. Since we have only 5 nodes to work clients, we set mapred.tasktracker.map.tasks.maximum=3 to avoid client side bottleneck.

Experiment

Eventually Elapsed Time(s)

row/s

rows/s/node

ms/row

Elapsed Thread Time (ms)

Google

(per node in 50-cluster)

randomRead

2079

2522

504

0.396

25,990,981

593

randomWrite(int)

586

8947

1789

0.112

7,422,606

3745

randomWrite

569

9214

1843

0.109

6,920,756

3745

sequentialRead

187

28037

5607

0.036

2,216,579

2463

sequentialWrite(int)

490

10700

2140

0.093

5,762,259

3623

sequentialWrite

296

17712

3542

0.056

3,266,534

3623

scan

85948

17190

0.011

500,896

10526

(1) randomWrite (init) and sequentialWrite (init) are evaluations against a new table. Since there is only one RegionServer is accessed at the beginning, the performance is not so good. randomWrite and sequentialWrite are evaluations against a existing table that is already distributed on all 5 nodes.

(2) sequentialWrite and randomWrite: Client side write-buffer accelerates the sequentialWrite, but not so distinct. Since each write operation always writes into commit-log file and memstore. According to HBASE-1771, the sequentialWrite and randomWrite should go up by factors of 2-4. We need retest after HBASE-1771.

(3) randomRead performance is not good enough. That's about as good as we can get on our hardware. Bloom filters will only help in the case of a miss, not a hit. Besides that, we're already showing 2-4X better performance than a disk seek (10ms). Any other improvement will have to come from HDFS optimizations, RPC optimizations, and of course we can always get better performance by loading up with more RAM for the filesystem cache. Try 8GB or 16GB, we might get sub-ms on average per node, but remember, we're serving out of memory then and not seeking. Adding more memory (and region-server heap) should help the numbers across the board. The BigTable paper shows 1212 random reads per second on a single node. That's sub-ms for random access, clearly not actually doing disk seeks for most gets. Maybe RAID0 on multiple disks can also help.

(Thanks for good comments from Jonathan Gray and stack.)

Compares to the metrics in Google Paper (Figure 6): The write and randomRead performance is still not so good, but this result is much better than any previous HBase release, especially the randomRead. We even got better result than the paper on sequentialRead and scan evaluations. (and we should be aware of that the paper was published in 2006). This result gives us confidence.
- The new HFile should be the major success.
- BlockCache provide more performance to sequentialRead and scan.
- scan is so fast, MapReduce analysis on HBase table will be efficient.

Looking forward to and researching following features:
- Bloom Filter to accelerate randomRead.
- Bulk-load.

We need do more analysis for this evaluation and read code detail. Here is our PerformanceEvaluation code: http://dl.getdropbox.com/u/24074/code/PerformanceEvaluation.java

References:

[1] Ryan Rawson’s Presentation on NOSQL. http://blog.oskarsson.nu/2009/06/nosql-debrief.html

[2] HBase goes Realtime, http://wiki.apache.org/hadoop-data/attachments/HBase(2f)HBasePresentations/attachments/HBase_Goes_Realtime.pdf

[3] Google, Bigtable: A Distributed Storage System for Structured Data http://labs.google.com/papers/bigtable.html