spark提交任务java.nio.channels.ClosedChannelException
来源:互联网 发布:高中免费视频讲课软件 编辑:程序博客网 时间:2024/06/06 20:21
1.提交任务
./spark-submit --master "yarn" --driver-memory 1g --executor-memory 1g --class KeyCount /root/IdeaProjects/SparkApp/out/artifacts/SparkApp_jar/SparkApp.jar
报错如下:
17/08/25 14:47:03 ERROR client.TransportClient: Failed to send RPC 6159851572252707613 to /192.168.2.6:39986: java.nio.channels.ClosedChannelExceptionjava.nio.channels.ClosedChannelException17/08/25 14:47:03 ERROR cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map()) to AM was unsuccessfuljava.io.IOException: Failed to send RPC 6159851572252707613 to /192.168.2.6:39986: java.nio.channels.ClosedChannelExceptionat org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:249)at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:233)at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)at io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)at io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)at java.lang.Thread.run(Thread.java:745)Caused by: java.nio.channels.ClosedChannelException
2.因为spark on yarn,查看ResourceMangaer的log
2017-08-25 14:45:19,990 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current stateorg.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHEDat org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:806)at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:107)at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:803)at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:784)at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)at java.lang.Thread.run(Thread.java:745)
2017-08-25 14:47:03,147 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_1503641152441_0014_02_000001 has processes older than 1 iteration running over the configured limit. Limit=2254857728, current usage = 25401180162017-08-25 14:47:03,147 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=14043,containerID=container_1503641152441_0014_02_000001] is running beyond virtual memory limits. Current usage: 360.4 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.Dump of the process-tree for container_1503641152441_0014_02_000001 :|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE|- 14043 14041 14043 14043 (bash) 0 0 115847168 730 /bin/bash -c /usr/java/jdk1.8.0_73/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop/nm-local-dir/usercache/root/appcache/application_1503641152441_0014/container_1503641152441_0014_02_000001/tmp -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/logs/userlogs/application_1503641152441_0014/container_1503641152441_0014_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg '192.168.2.6:45439' --properties-file /tmp/hadoop/nm-local-dir/usercache/root/appcache/application_1503641152441_0014/container_1503641152441_0014_02_000001/__spark_conf__/__spark_conf__.properties 1> /usr/local/hadoop/logs/userlogs/application_1503641152441_0014/container_1503641152441_0014_02_000001/stdout 2> /usr/local/hadoop/logs/userlogs/application_1503641152441_0014/container_1503641152441_0014_02_000001/stderr |- 14047 14043 14043 14043 (java) 628 24 2424270848 91542 /usr/java/jdk1.8.0_73/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop/nm-local-dir/usercache/root/appcache/application_1503641152441_0014/container_1503641152441_0014_02_000001/tmp -Dspark.yarn.app.container.log.dir=/usr/local/hadoop/logs/userlogs/application_1503641152441_0014/container_1503641152441_0014_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 192.168.2.6:45439 --properties-file /tmp/hadoop/nm-local-dir/usercache/root/appcache/application_1503641152441_0014/container_1503641152441_0014_02_000001/__spark_conf__/__spark_conf__.properties 2017-08-25 14:47:03,148 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Removed ProcessTree with root 140432017-08-25 14:47:03,148 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1503641152441_0014_02_000001 transitioned from RUNNING to KILLING2017-08-25 14:47:03,148 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1503641152441_0014_02_0000012017-08-25 14:47:03,152 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1503641152441_0014_02_000001 is : 1432017-08-25 14:47:03,163 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1503641152441_0014_02_000001 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL2017-08-25 14:47:03,164 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=rootOPERATION=Container Finished - KilledTARGET=ContainerImplRESULT=SUCCESSAPPID=application_1503641152441_0014CONTAINERID=container_1503641152441_0014_02_0000012017-08-25 14:47:03,164 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1503641152441_0014_02_000001 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE2017-08-25 14:47:03,164 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_1503641152441_0014_02_000001 from application application_1503641152441_00142017-08-25 14:47:03,164 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1503641152441_00142017-08-25 14:47:03,164 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /tmp/hadoop/nm-local-dir/usercache/root/appcache/application_1503641152441_0014/container_1503641152441_0014_02_0000012017-08-25 14:47:03,196 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1503641152441_0014_000002 (auth:SIMPLE)2017-08-25 14:47:03,207 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_1503641152441_0014_02_0000012017-08-25 14:47:03,207 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=rootIP=192.168.2.6OPERATION=Stop Container RequestTARGET=ContainerManageImplRESULT=SUCCESSAPPID=application_1503641152441_0014CONTAINERID=container_1503641152441_0014_02_0000012017-08-25 14:47:04,172 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1503641152441_0014_02_000003 is : 12017-08-25 14:47:04,172 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1503641152441_0014_02_000003 and exit code: 1ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)at org.apache.hadoop.util.Shell.run(Shell.java:479)at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)at java.util.concurrent.FutureTask.run(FutureTask.java:266)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at java.lang.Thread.run(Thread.java:745)
很明显
Current usage: 360.4 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.意思是说container使用的虚拟内存超过了设置的2.1G
那么,问题来了,这个虚拟内存的数量从那儿来的呢?
是从yarn-site.xml中配置计算来的,yarn.scheduler.minimum-allocation-mb * yarn.nodemanager.vmem-pmem-ratio = 虚拟内存的总量,如果需要的虚拟内存总量超过这个计算所得的数值,就会出发 Killing container.
此处 我的yarn.scheduler.minimum-allocation-mb值没设置,默认为1G,yarn.nodemanager.vmem-pmem-ratio也没设置,默认为2.1,因此,就有了以上的日志,用了1g里的360M物理内存,用了2.1G里的2.4G虚拟内存。
然后修改yarn-site.xml如下几个配置
<property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>9216</value> <discription>每个任务最多可用内存,单位MB,默认8182MB</discription> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>4000</value> <discription>每个任务最shao可用内存</discription> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>4.1</value> </property>
上边的报错消失,并且日志打印出如下内容:
2017-08-25 15:53:27,670 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 26478 for container-id container_1503646903552_0001_01_000001: 334.4 MB of 3.9 GB physical memory used; 2.4 GB of 16.0 GB virtual memory used
另外好多贴子的解决方法是关闭这个虚拟内存的检测,个人不太建议如此。
在yarn-site.xml配置如下:
- <property>
- <name>yarn.nodemanager.vmem-check-enabled</name>
- <value>false</value>
- </property>
================
迷途小运维随笔
转载请注明出处
阅读全文
0 0
- spark提交任务java.nio.channels.ClosedChannelException
- spark<java.nio.channels.ClosedChannelException>
- DUBBO Caused by: java.nio.channels.ClosedChannelException
- Kafka异常 java.nio.channels.ClosedChannelException
- Netty: 注意不要为java.nio.channels.ClosedChannelException浪费时间
- storm 报错:java.nio.channels.ClosedChannelException: null
- dubbo java.nio.channels.ClosedChannelException、com.alibaba.dubbo.remoting.RemotingException: Failed
- java.nio.channels.IllegalBlockingModeException
- java.nio.channels.IllegalBlockingModeException
- java.nio.channels.CancelledKeyException
- Java Web提交任务到Spark
- Spark通过Java Web提交任务
- <转>Java Web提交任务到Spark
- 关于hadoop的java.nio.channels.ClosedByInterruptException
- KafKa error java.nio.channels.UnresolvedAddressException
- API笔记之java.nio.channels.Selector
- API笔记之java.nio.channels.SelectionKey
- Caused by: java.nio.channels.NotYetConnectedException: null
- Yii
- 我与ActiveMQ的恩怨情仇
- CarbonData 使用性能测试
- linux更换源
- vue-router2学习实践笔记(附DEMO)
- spark提交任务java.nio.channels.ClosedChannelException
- application运行报类加载不了或类不存在问题
- asdf
- git服务器类型
- 模拟Hibernate实现
- XTP控件ReportCtrl使用
- MVP架构设计--->001
- PAT乙级1035. 插入与归并(25)
- STM32外部中断