运行 Giraph 提示 too many counters

来源：互联网发布：淘宝评价管理在哪编辑：程序博客网时间：2024/04/19 17:20

运行 Giraph 提示 too many counters

在加入 -ca mapreduce.job.counters.limit=1000 后，仍然运行失败

16/10/20 08:56:08 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 2 mappers16/10/20 08:56:38 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer s3:22181 --zkNode /_hadoopBsp/job_1476868823433_0017/_haltComputation'16/10/20 08:56:38 INFO mapreduce.Job: Running job: job_1476868823433_001716/10/20 08:56:39 INFO mapreduce.Job: Job job_1476868823433_0017 running in uber mode : false16/10/20 08:56:39 INFO mapreduce.Job:  map 50% reduce 0%16/10/20 08:56:47 INFO mapreduce.Job:  map 100% reduce 0%16/10/20 08:56:47 INFO mapreduce.Job: Job job_1476868823433_0017 failed with state FAILED due to: Task failed task_1476868823433_0017_m_000000Job failed as tasks failed. failedMaps:1 failedReduces:016/10/20 08:56:47 INFO mapreduce.Job: Counters: 34    File System Counters        FILE: Number of bytes read=0        FILE: Number of bytes written=97529        FILE: Number of read operations=0        FILE: Number of large read operations=0        FILE: Number of write operations=0        HDFS: Number of bytes read=76        HDFS: Number of bytes written=0        HDFS: Number of read operations=8        HDFS: Number of large read operations=0        HDFS: Number of write operations=4    Job Counters         Failed map tasks=1        Launched map tasks=2        Other local map tasks=2        Total time spent by all maps in occupied slots (ms)=33269        Total time spent by all reduces in occupied slots (ms)=0        Total time spent by all map tasks (ms)=33269        Total vcore-seconds taken by all map tasks=33269        Total megabyte-seconds taken by all map tasks=34067456    Map-Reduce Framework        Map input records=1        Map output records=0        Input split bytes=44        Spilled Records=0        Failed Shuffles=0        Merged Map outputs=0        GC time elapsed (ms)=130        CPU time spent (ms)=7280        Physical memory (bytes) snapshot=186077184        Virtual memory (bytes) snapshot=823398400        Total committed heap usage (bytes)=200802304    Zookeeper base path        /_hadoopBsp/job_1476868823433_0017=0    Zookeeper halt node        /_hadoopBsp/job_1476868823433_0017/_haltComputation=0    Zookeeper server:port        s3:22181=0    File Input Format Counters         Bytes Read=0    File Output Format Counters         Bytes Written=0

Log 日志显示：

2016-10-20 08:56:38,569 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.MasterThread: masterThread: Coordination of superstep 199 took 0.016 seconds ended with state ALL_SUPERSTEPS_DONE and is now on superstep 2002016-10-20 08:56:38,573 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: setJobState: {"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1} on superstep 2002016-10-20 08:56:38,574 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: setJobState: {"_stateKey":"FINISHED","_applicationAttemptKey":-1,"_superstepKey":-1}2016-10-20 08:56:38,574 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x157df96ea710000 type:create cxid:0x236f zxid:0x143b txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1476868823433_0017/_cleanedUpDir Error:KeeperErrorCode = NoNode for /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir2016-10-20 08:56:38,574 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x157df96ea710001 type:create cxid:0xd8f zxid:0x143c txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1476868823433_0017/_masterJobState Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1476868823433_0017/_masterJobState2016-10-20 08:56:38,575 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanup: Notifying master its okay to cleanup with /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir/0_master2016-10-20 08:56:38,575 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x157df96ea710000 type:create cxid:0x2375 zxid:0x143f txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_1476868823433_0017/_cleanedUpDir Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir2016-10-20 08:56:38,575 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanUpZooKeeper: Node /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir already exists, no need to create.2016-10-20 08:56:38,576 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanUpZooKeeper: Got 1 of 2 desired children from /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir2016-10-20 08:56:38,576 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanedUpZooKeeper: Waiting for the children of /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir to change since only got 1 nodes.2016-10-20 08:56:40,710 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): IPC server unable to read call parameters: Too many counters: 121 max=120    at org.apache.hadoop.ipc.Client.call(Client.java:1411)    at org.apache.hadoop.ipc.Client.call(Client.java:1364)    at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)    at com.sun.proxy.$Proxy7.statusUpdate(Unknown Source)    at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:737)    at java.lang.Thread.run(Thread.java:745)2016-10-20 08:56:40,879 INFO [main-EventThread] org.apache.giraph.bsp.BspService: process: cleanedUpChildrenChanged signaled2016-10-20 08:56:40,880 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanUpZooKeeper: Got 2 of 2 desired children from /_hadoopBsp/job_1476868823433_0017/_cleanedUpDir2016-10-20 08:56:40,880 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x157df96ea7100012016-10-20 08:56:40,882 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:22181] org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /219.223.239.57:49390 which had sessionid 0x157df96ea7100012016-10-20 08:56:40,888 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: cleanup: Removed HDFS checkpoint directory (_bsp/_checkpoints//job_1476868823433_0017) with return = false since the job Giraph: cost.Test succeeded 2016-10-20 08:56:40,888 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyClient: stop: Halting netty client2016-10-20 08:56:40,890 INFO [netty-client-worker-0] org.apache.giraph.comm.netty.NettyClient: stop: reached wait threshold, 1 connections closed, releasing resources now.2016-10-20 08:56:43,095 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyClient: stop: Netty client halted2016-10-20 08:56:43,095 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyServer: stop: Halting netty server2016-10-20 08:56:43,106 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyServer: stop: Start releasing resources2016-10-20 08:56:43,780 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): IPC server unable to read call parameters: Too many counters: 121 max=120    at org.apache.hadoop.ipc.Client.call(Client.java:1411)    at org.apache.hadoop.ipc.Client.call(Client.java:1364)    at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)    at com.sun.proxy.$Proxy7.statusUpdate(Unknown Source)    at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:737)    at java.lang.Thread.run(Thread.java:745)2016-10-20 08:56:43,793 INFO [communication thread] org.apache.hadoop.mapred.Task: Process Thread Dump: Communication exception46 active threadsThread 56 (netty-server-worker-15):  State: RUNNABLE  Blocked count: 0  Waited count: 1  Stack:    sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)    sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)    sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)    sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)    sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)    io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:596)    io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:306)    io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)    java.lang.Thread.run(Thread.java:745)Thread 55 (netty-server-worker-14):  State: RUNNABLE  Blocked count: 0  Waited count: 1

还是 Counters 的问题

网上查好像是说在命令行设置 mapreduce.job.counters.limit 属性没有作用
于是设置 $HADOOP_HOME/conf/mapred-site.xml 中的属性

<property>    <name>mapreduce.job.counters.limit</name>    <value>20000</value>    <description>Limit on the number of counters allowed per job. The default value is 200.</description></property>

一切正常～

0 0