HBase Metrics

来源:互联网 发布:js弹出div模态窗口 编辑:程序博客网 时间:2024/06/05 00:35

HBase通过Hadoop metrics API统计指标,默认是10秒统计一次,可以把这些指标与Ganglia结合,也可以过滤某些指标或者扩展指标。

1 指标设置

HBase 0.95后,HBase附带了默认的指标配置或sink。编辑文件conf/hadoop-metrics2-hbase.properties配置region server的指标,重启改变了的region server使其生效。

改变默认的抽样速率,在行开始编辑*.period,过滤或扩展指标框架,参见http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html。

HBase Metrics and Ganglia

HBase默认会统计每个region server中大量的指标,Ganglia难以处理所有的指标,要么升级Ganglia server的处理能力,要么减少指标数量,参见Metrics Filtering。

2 禁用指标

禁用某个region server的指标,编辑conf/hadoop-metrics2-hbase.properties,注释相关行,重启改变了的region server使其生效。

3 查看可用指标

  • Web UI,Metrics Dump
  • JMX工具,如jconsole

4 指标测量单位

不同的指标都不同的测量单位,下面是常见示例:

  • 时间点描述为时间戳
  • 时间年龄(如ageOfLastShippedOp)描述为毫秒
  • 内存大小描述为字节
  • 队列大小(如sizeOfLogQueue)描述为items的个数
  • 某种操作的次数(如logEditsRead)描述为整数

5 Master重要指标

hbase.master.numRegionServers
Number of live regionservers

hbase.master.numDeadRegionServers
Number of dead regionservers

hbase.master.ritCount
The number of regions in transition

hbase.master.ritCountOverThreshold
The number of regions that have been in transition longer than a threshold time (default: 60 seconds)

hbase.master.ritOldestAge
The age of the longest region in transition, in milliseconds

6 RegionServer重要指标

hbase.regionserver.regionCount
The number of regions hosted by the regionserver

hbase.regionserver.storeFileCount
The number of store files on disk currently managed by the regionserver

hbase.regionserver.storeFileSize
Aggregate size of the store files on disk

hbase.regionserver.hlogFileCount
The number of write ahead logs not yet archived

hbase.regionserver.totalRequestCount
The total number of requests received

hbase.regionserver.readRequestCount
The number of read requests received

hbase.regionserver.writeRequestCount
The number of write requests received

hbase.regionserver.numOpenConnections
The number of open connections at the RPC layer

hbase.regionserver.numActiveHandler
The number of RPC handlers actively servicing requests

hbase.regionserver.numCallsInGeneralQueue
The number of currently enqueued user requests

hbase.regionserver.numCallsInReplicationQueue
The number of currently enqueued operations received from replication

hbase.regionserver.numCallsInPriorityQueue
The number of currently enqueued priority (internal housekeeping) requests

hbase.regionserver.flushQueueLength
Current depth of the memstore flush queue. If increasing, we are falling behind with clearing memstores out to HDFS.

hbase.regionserver.updatesBlockedTime
Number of milliseconds updates have been blocked so the memstore can be flushed

hbase.regionserver.compactionQueueLength
Current depth of the compaction request queue. If increasing, we are falling behind with storefile compaction.

hbase.regionserver.blockCacheHitCount
The number of block cache hits

hbase.regionserver.blockCacheMissCount
The number of block cache misses

hbase.regionserver.blockCacheExpressHitPercent
The percent of the time that requests with the cache turned on hit the cache

hbase.regionserver.percentFilesLocal
Percent of store file data that can be read from the local DataNode, 0-100

hbase.regionserver._
Operation latencies, where is one of Append, Delete, Mutate, Get, Replay, Increment; and where is one of min, max, mean, median, 75th_percentile, 95th_percentile, 99th_percentile

hbase.regionserver.slowCount
The number of operations we thought were slow, where is one of the list above

hbase.regionserver.GcTimeMillis
Time spent in garbage collection, in milliseconds

hbase.regionserver.GcTimeMillisParNew
Time spent in garbage collection of the young generation, in milliseconds

hbase.regionserver.GcTimeMillisConcurrentMarkSweep
Time spent in garbage collection of the old generation, in milliseconds

hbase.regionserver.authenticationSuccesses
Number of client connections where authentication succeeded

hbase.regionserver.authenticationFailures
Number of client connection authentication failures

hbase.regionserver.mutationsWithoutWALCount
Count of writes submitted with a flag indicating they should bypass the write ahead log

0 0