HBase配置优化

来源：互联网发布：java map remove 编辑：程序博客网时间：2024/05/21 11:06

hbase配置修改：

(split是因为hfile过多，进行split，split之后进行compact

可以可能要有人喷了，hfile多了应该compact才对啦。贴出0.98.1的代码，大致逻辑是region没有block的compact（优先级大于等于1的），则进行split)

private boolean flushRegion(final FlushRegionEntry fqe) { HRegion region = fqe.region; if (!region.getRegionInfo().isMetaRegion() && isTooManyStoreFiles(region)) {//这个函数使用了参数 if (fqe.isMaximumWait(this.blockingWaitTime)) { LOG.info("Waited " + (System.currentTimeMillis() - fqe.createTime) + "ms on a compaction to clean up 'too many store files'; waited " + "long enough... proceeding with flush of " + region.getRegionNameAsString()); } else { // If this is first time we've been put off, then emit a log message. if (fqe.getRequeueCount() <= 0) { // Note: We don't impose blockingStoreFiles constraint on meta regions LOG.warn("Region " + region.getRegionNameAsString() + " has too many " + "store files; delaying flush up to " + this.blockingWaitTime + "ms"); if (!this.server.compactSplitThread.requestSplit(region)) {//这里是关键的逻辑，逻辑是region没有block的compact（优先级大于等于1的），则进行split；否则进行compact try { this.server.compactSplitThread.requestSystemCompaction( region, Thread.currentThread().getName()); } catch (IOException e) { LOG.error( "Cache flush failed for region " + Bytes.toStringBinary(region.getRegionName()), RemoteExceptionHandler.checkIOException(e)); } } } // Put back on the queue. Have it come back out of the queue // after a delay of this.blockingWaitTime / 100 ms. this.flushQueue.add(fqe.requeue(this.blockingWaitTime / 100)); // Tell a lie, it's not flushed but it's ok return true; } } return flushRegion(region, false); }

hbase.hstore.blockingStoreFiles hfile数量上限，如果超过，则进行阻塞写，进行split | compact

hbase.hstore.blockingWaitTime 阻塞写的时间上限，到时间没进行split或compact（就是没锁上，则继续）

最大region 500G，禁止常规的split情况

<name>hbase.hregion.max.filesize</name>

</property>

一个store中30个hfile的上限

<name>hbase.hstore.blockingStoreFiles</name>

</property>

一分半的写的阻塞上限

<name>hbase.hstore.blockingWaitTime</name>

</property>

hbase.regionserver.regionSplitLimit region包含的最大region数, split需要检查现有region不大于这个compact Priority逻辑

初始化为int.minvalue,user为1，被block>1

-----------------------------------------------

DEBUG [LruStats #0] hfile.LruBlockCache: Total=11.78 GB, free=1.01 GB, max=12.79 GB,

memcache设置256

memcache使用mslb

使用mslb

<name>hbase.hregion.memstore.mslab.enabled</name>

</property>

memcash的flush的条件256M

<name>hbase.hregion.memstore.flush.size</name>

</property>

安全检查memstore使用region_heap的百分比，强制flush

</property>

base.regionserver.global.memstore.lowerLimit

<name>hbase.regionserver.global.memstore.lowerLimit</name>

一个RS中所有的memstore的总容量超过堆的该百分比限制后，将被强制flush到磁盘。

Maximum size of all memstores in a region server before flushes are forced. Defaults to 35% of heap. 这个值与

hbase.regionserver.global.memstore.upperLimit相等，以减小由于到达该值触发flush的几率，因为这种flush会block写请求

</description>

</property>

安全检查memstore使用region_heap的百分比，强制flush，并阻塞写请求

<name>hbase.regionserver.global.memstore.upperLimit</name>

一个region中所有memstore的总大小超过堆的该百分比限制时，会发生强制flush，并block更新请求。

默认是堆大小的40%。更新会被阻塞，并发生强制flush，直到所有memstore的大小达到

hbase.regionserver.global.memstore.lowerLimit的限制。

</description>

</property>

达到flushsize指定倍数时，会强制flush，并阻塞请求

<name>hbase.hregion.memstore.block.multiplier</name>

当一个region的memstore达到hbase.hregion.memstore.block.multiplier * hbase.hregion.flush.size的指定倍数时，阻塞写请求。

这是一个避免在写请求高峰时期避免memstore耗尽的有效设置。如果没有上限限制，memstore被填满后发生flush时，

会消耗大量的时间来处理合并和分割，甚至导致OOM。

</description>

</property>

----------------------------------------------------

gc的问题

3.5分钟挂掉

11 分钟

（70）提前gc，减少每个gc耗时

hbase-env.sh中

export HBASE_REGIONSERVER_OPTS="-Xmx16g -Xms16g -Xmn128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"

----------------------------------------------------

compact时间是否过长？？

compact的时候gc过长

2015-02-25 11:54:50,670 WARN [regionserver60020.periodicFlusher] util.Sleeper: We slept 565427ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, se

e http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

2015-02-25 11:54:50,670 WARN [DataStreamer for file /hbase/WALs/host64,60020,1422073827259/host64%2C60020%2C1422073827259.1424835178059 block BP-1540478979-192.168.5.117-1409220943611:blk_1097821

214_24084365] hdfs.DFSClient: Error Recovery for block BP-1540478979-192.168.5.117-1409220943611:blk_1097821214_24084365 in pipeline 192.168.5.64:50010, 192.168.5.95:50010: bad datanode 192.168.5.

64:50010

2015-02-25 11:54:50,670 WARN [regionserver60020.compactionChecker] util.Sleeper: We slept 565427ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad,

see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

2015-02-25 11:54:50,670 INFO [regionserver60020-SendThread(host141:42181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 577669ms for sessionid 0x44add78c8664fdb,

closing socket connection and attempting reconnect

转为手动compact，需要逐步手动compact

<name>hbase.hregion.majorcompaction</name>

</property>

------------------------------------------------------

regionserver里的handler数量 50

<name>hbase.regionserver.handler.count</name>

<source>hbase-site.xml</source>

</property>

--------------------------------------------------

wal大小，影响memcash flush

当前hbase.regionserver.hlog.blocksize * hbase.regionserver.maxlogs 128*32=4G

但是hbase.regionserver.global.memstore.lowerLimit * HBASE_HEAPSIZE 0.38*32=12.16G

hbase.regionserver.global.memstore.upperLimit * HBASE_HEAPSIZE 0.4*32=12.8

注意：确保hbase.regionserver.hlog.blocksize * hbase.regionserver.maxlogs 比hbase.regionserver.global.memstore.lowerLimit * HBASE_HEAPSIZE的值只高那么一点点。.

改为

<name>hbase.regionserver.maxlogs</name>

</property>

<name>hbase.regionserver.global.memstore.lowerLimit</name>

</property>

128*105=13G

0.36*32=11.52G

0.4*32=12.8G

原则是让memflush不阻塞，禁止因为wal触发的flush，wal会进行多region flush，并且阻塞，这是最坏的情况

---------------------------------------------------

blockcache是读取时使用内存

<name>hfile.block.cache.size</name>

</property>

----------------------------------------------------

超时时间待验证，设置或过长

<name>hbase.rowlock.wait.duration</name>

每次获取行锁的超时时间，默认为30s

</description>

</property>

<name>hbase.regionserver.lease.period</name>

客户端每次获得rs一次socket时间

</description>

</property>

<name>hbase.rpc.timeout</name>

rpc超时时间

</description>

</property>

<name>hbase.client.scanner.timeout.period</name>

客户端每次scan|get的超时时间

</description>

</property>

<name>hbase.client.scanner.caching</name>

客户端每次scan的一个next，获得多少行，默认1

</description>

</property>

阅读全文

0 0