MapReduce-Counters含义

来源：互联网发布：js 设置div margin 编辑：程序博客网时间：2024/05/10 02:50

16/03/24 15:13:39 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=278 job读取本地文件系统的文件字节数。假定我们当前map的输入数据都来自于HDFS，那么在map阶段，这个数据应该是0。但reduce在执行前，它的输入数据是经过shuffle的merge后存储在reduce端本地磁盘中，所以这个数据就是所有reduce的总输入字节数。
FILE: Number of bytes written=415302 map的中间结果都会spill到本地磁盘中，在map执行完后，形成最终的spill文件。所以map端这里的数据就表示map task往本地磁盘中总共写了多少字节。与map端相对应的是，reduce端在shuffle时，会不断地拉取map端的中间结果，然后做merge并不断spill到自己的本地磁盘中。最终形成一个单独文件，这个文件就是reduce的输入文件。
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=282 整个job执行过程中，只有map端运行时，才从HDFS读取数据，这些数据不限于源文件内容，还包括所有map的split元数据。所以这个值应该比FileInputFormatCounters.BYTES_READ 要略大些。
HDFS: Number of bytes written=51 Reduce的最终结果都会写入HDFS，就是一个job执行结果的总量。
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2 此job启动了多少个map task
Launched reduce tasks=2 此job启动了多少个reduce task
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=11967
Total time spent by all reduces in occupied slots (ms)=12179
Total time spent by all map tasks (ms)=11967
Total time spent by all reduce tasks (ms)=12179
Total vcore-seconds taken by all map tasks=11967
Total vcore-seconds taken by all reduce tasks=12179
Total megabyte-seconds taken by all map tasks=12254208
Total megabyte-seconds taken by all reduce tasks=12471296
Map-Reduce Framework
Map input records=19 所有map task从HDFS读取的文件总行数
Map output records=19 map task的直接输出record是多少，就是在map方法中调用context.write的次数，也就是未经过Combine时的原生输出条数
Map output bytes=228 Map的输出结果key/value都会被序列化到内存缓冲区中，所以这里的bytes指序列化后的最终字节之和
Map output materialized bytes=290
Input split bytes=206
Combine input records=0 Combiner是为了减少尽量减少需要拉取和移动的数据，所以combine输入条数与map的输出条数是一致的。
Combine output records=0 经过Combiner后，相同key的数据经过压缩，在map端自己解决了很多重复数据，表示最终在map端中间文件中的所有条目数
Reduce input groups=2 Reduce总共读取了多少个这样的groups
Reduce shuffle bytes=290 Reduce端的copy线程总共从map端抓取了多少的中间数据，表示各个map task最终的中间文件总和
Reduce input records=19 如果有Combiner的话，那么这里的数值就等于map端Combiner运算后的最后条数，如果没有，那么就应该等于map的输出条数
Reduce output records=2 所有reduce执行后输出的总条目数
Spilled Records=38 spill过程在map和reduce端都会发生，这里统计在总共从内存往磁盘中spill了多少条数据
Shuffled Maps =4 每个reduce几乎都得从所有map端拉取数据，每个copy线程拉取成功一个map的数据，那么增1，所以它的总数基本等于 reduce number * map number
Failed Shuffles=0 copy线程在抓取map端中间数据时，如果因为网络连接异常或是IO异常，所引起的shuffle错误次数
Merged Map outputs=4 记录着shuffle过程中总共经历了多少次merge动作
GC time elapsed (ms)=330 通过JMX获取到执行map与reduce的子JVM总共的GC时间消耗
CPU time spent (ms)=8070
Physical memory (bytes) snapshot=880254976
Virtual memory (bytes) snapshot=3633971200
Total committed heap usage (bytes)=709885952
Shuffle Errors
BAD_ID=0 每个map都有一个ID，如attempt_201109020150_0254_m_000000_0，如果reduce的copy线程抓取过来的元数据中这个ID不是标准格式，那么此Counter增加
CONNECTION=0 表示copy线程建立到map端的连接有误
IO_ERROR=0 Reduce的copy线程如果在抓取map端数据时出现IOException，那么这个值相应增加
WRONG_LENGTH=0 map端的那个中间结果是有压缩好的有格式数据，所有它有两个length信息：源数据大小与压缩后数据大小。如果这两个length信息传输的有误(负值)，那么此Counter增加
WRONG_MAP=0 每个copy线程当然是有目的:为某个reduce抓取某些map的中间结果，如果当前抓取的map数据不是copy线程之前定义好的map，那么就表示把数据拉错了
WRONG_REDUCE=0 与上面描述一致，如果抓取的数据表示它不是为此reduce而准备的，那还是拉错数据了。
File Input Format Counters
Bytes Read=76 Map task的所有输入数据(字节)，等于各个map task的map方法传入的所有value值字节之和。
File Output Format Counters
Bytes Written=51 Reduce task的所有输出数据(字节)，等于各个reduce task的reduce方法传入的所有value值字节之和。

文章摘抄自《[hadoop源码阅读][5]-counter的使用和默认counter的含义》

0 0