What is Hadoop Metrics2?
来源:互联网 发布:粤贵银分析软件下载 编辑:程序博客网 时间:2024/06/13 21:38
source:http://blog.cloudera.com/blog/2012/10/what-is-hadoop-metrics2/
Metrics are collections of information about Hadoop daemons, events and measurements; for example, data nodes collect metrics such as the number of blocks replicated, number of read requests from clients, and so on. For that reason, metrics are an invaluable resource for monitoring Apache Hadoop services and an indispensable tool for debugging system problems.
This blog post focuses on the features and use of the Metrics2 system for Hadoop, which allows multiple metrics output plugins to be used in parallel, supports dynamic reconfiguration of metrics plugins, provides metrics filtering, and allows all metrics to be exported via JMX.
Metrics vs. MapReduce Counters
When speaking about metrics, a question about their relationship to MapReduce counters usually arises. This differences can be described in two ways: First, Hadoop daemons and services are generally the scope for metrics, whereas MapReduce applications are the scope for MapReduce counters (which are collected for MapReduce tasks and aggregated for the whole job). Second, whereas Hadoop administrators are the main audience for metrics, MapReduce users are the audience for MapReduce counters.
Contexts and Prefixes
For organizational purposes metrics are grouped into named contexts – e.g., jvm for java virtual machine metrics or dfs for the distributed file system metric. There are different sets of contexts supported by Hadoop-1 and Hadoop-2; the table below highlights the ones supported for each of them.
Branch-1
Branch-2
– jvm– rpc
– rpcdetailed
– metricssystem
– mapred
– dfs
– ugi– yarn
– jvm
– rpc
– rpcdetailed
– metricssystem
– mapred
– dfs
– ugi
A Hadoop daemon collects metrics in several contexts. For example, data nodes collect metrics for the “dfs”, “rpc” and “jvm” contexts. The daemons that collect different metrics in Hadoop (for Hadoop-1 and Hadoop-2) are listed below:
– namenode
– datanode
– jobtracker
– tasktracker
– maptask
– reducetask
– namenode
– secondarynamenode
– datanode
– resourcemanager
– nodemanager
– mrappmaster
– maptask
– reducetask
System Design
The Metrics2 framework is designed to collect and dispatch per-process metrics to monitor the overall status of the Hadoop system. Producers register the metrics sources with the metrics system, while consumers register the sinks. The framework marshals metrics from sources to sinks based on (per source/sink) configuration options. This design is depicted below.
Here is an example class implementing the MetricsSource:
The “MyMetric” in the listing above could be, for example, the number of open connections for a specific server.
Here is an example class implementing the MetricsSink:
To use the Metric2s framework, the system needs to be initialized and sources and sinks registered. Here is an example initialization:
Configuration and Filtering
The Metrics2 framework uses the PropertiesConfiguration from the apache commons configuration library.
Sinks are specified in a configuration file (e.g., “hadoop-metrics2-test.properties”), as:
The configuration syntax is:
In the previous example, test
is the prefix and mysink0
is an instance name. DefaultMetricsSystem
would try to load hadoop-metrics2-[prefix].properties
first, and if not found, try the default hadoop-metrics2.properties
in the class path. Note, the [instance]
is an arbitrary name to uniquely identify a particular sink instance. The asterisk (*) can be used to specify default options.
Here is an example with inline comments to identify the different configuration sections:
Here is an example set of NodeManager metrics that are dumped into the NodeManager sink file:
Each line starts with a time followed by the context and metrics name and the corresponding value for each metric.
Filtering
By default, filtering can be done by source, context, record and metrics. More discussion of different filtering strategies can be found in the Javadoc and wiki.
Example:
Conclusion
The Metrics2 system for Hadoop provides a gold mine of real-time and historical data that help monitor and debug problems associated with the Hadoop services and jobs.
- What is Hadoop Metrics2?
- What Hadoop is Not
- What Is Apache Hadoop?
- What Is Apache Hadoop?
- what is hadoop?
- What Hadoop is good at
- ganglia在hadoop中的配置:hadoop-metrics2.properties
- What is what ?!
- what is what?
- Hadoop: What it is, how it works, and what it can do
- ganglia在hadoop中的配置:hadoop-metrics2.properties 以及各监控项含义
- 关于Hadoop启动一段时间后DataNode消失:WARN org.apache.hadoop.metrics2.util.MBeans: Hadoop:service=DataNode,
- What is portal? & What is portlet?
- what is game? what is good game?
- What Is 'SSL'? What Is 'SSH'?
- ccah-500 第20题 What is the result when you execute: hadoop jar SampleJar MyClass
- WHAT IS C#
- What is System Administration?
- 关于Integer的"=="
- 如何在Cocos2D 1.0 中掩饰一个精灵(四)
- sicily 1321. Robot
- 一个字符串既含有字母又含有数字且字母和数字随意排列如何取出其中所有的数字并进行排序
- arguments和Array.prototype.slice.call(arguments,0);
- What is Hadoop Metrics2?
- 基于NCC模板匹配识别
- 指针
- Android项目重构之路:实现篇
- 成熟是什么?成熟就是喜欢的东西依旧喜欢,但可以不拥有;成熟就是害怕的东西依旧害怕,但可以面对;成熟就是讨厌的东西依旧讨厌,但可以忍受。成熟就是以前觉得难以理喻的事情可以理所当然。
- 南大软院大神养成计划——继续标签学习
- mac 下SVN
- svg绘制logo
- Java语言实现的装饰设计模式复习