Spark问题10之Spark运行时节点空间不足导致运行报错
来源:互联网 发布:任正非 人工智能 编辑:程序博客网 时间:2024/06/05 11:04
更多代码请见:https://github.com/xubo245/SparkLearning
Spark生态之Alluxio学习 版本:alluxio(tachyon) 0.7.1,spark-1.5.2,hadoop-2.6.0
1.问题描述
1.1 简述
在写了脚本运行多个application的时候,运行到十几个之后,报错了。
org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 in stage 1.0 failed 4 times, most recent failure: Lost task 8.3 in stage 1.0 (TID 25, Mcnode4): org.apache.spark.SparkException: File ./DSA.jar exists and does not match contents of http://Master:41701/jars/DSA.jar
查看history时发现是节点空间不足:http://master:18080/history/app-20170209152626-0632/stages/stage/?id=1&attempt=0
需要深入查询到Tasks层记录,才发现问题,1.2的报错没有直接提前。
java.io.IOException: No space left on device +detailsjava.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at org.spark-project.guava.io.ByteStreams.copy(ByteStreams.java:211) at org.spark-project.guava.io.ByteSource.copyTo(ByteSource.java:204) at org.spark-project.guava.io.Files.copy(Files.java:436) at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:514) at org.apache.spark.util.Utils$.copyFile(Utils.scala:485) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:362) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:405) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:397) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
查看的hdfs和Mcnode4,确认了确实是空间不足,就2.47 MB 可用了,DSA.jar是2.5M。 主要是work的application记录逐渐增多。
1.2 问题报错记录
hadoop@Master:~/disk2/xubo/project/alignment/SparkSW/SparkSW20161114/alluxio-1.3.0$ ./cloudSWatmtimequerystandalone.sh > cloudSWatmtimequerystandalonetime201702072344.txt[Stage 1:> (0 + 16) / 128]Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 in stage 1.0 failed 4 times, most recent failure: Lost task 8.3 in stage 1.0 (TID 25, Mcnode4): org.apache.spark.SparkException: File ./DSA.jar exists and does not match contents of http://Master:41701/jars/DSA.jar at org.apache.spark.util.Utils$.copyFile(Utils.scala:464) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:362) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:405) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:397) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:397) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)at scala.Option.foreach(Option.scala:236)at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1007)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)at org.apache.spark.rdd.RDD.reduce(RDD.scala:989)at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1.apply(RDD.scala:1370) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:310) at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1357) at org.apache.spark.rdd.RDD$$anonfun$top$1.apply(RDD.scala:1338)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)at org.apache.spark.rdd.RDD.top(RDD.scala:1337)at org.dsa.core.DSW.align(DSW.scala:39)at org.dsa.core.SequenceAlignment$$anonfun$run$1.apply(SequenceAlignment.scala:33) at org.dsa.core.SequenceAlignment$$anonfun$run$1.apply(SequenceAlignment.scala:32)at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)at org.dsa.core.SequenceAlignment.run(SequenceAlignment.scala:32)at org.dsa.core.DSW$.main(DSW.scala:138)at org.dsa.time.CloudSWATMQueryTime$$anonfun$main$1$$anonfun$apply$mcVI$sp$3$$anonfun$apply$mcVI$sp$4.apply$mcVI$sp(CloudSWATMQueryTime.scala:93) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.dsa.time.CloudSWATMQueryTime$$anonfun$main$1$$anonfun$apply$mcVI$sp$3.apply$mcVI$sp(CloudSWATMQueryTime.scala:92) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.dsa.time.CloudSWATMQueryTime$$anonfun$main$1.apply$mcVI$sp(CloudSWATMQueryTime.scala:85)at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)at org.dsa.time.CloudSWATMQueryTime$.main(CloudSWATMQueryTime.scala:13)at org.dsa.time.CloudSWATMQueryTime.main(CloudSWATMQueryTime.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: org.apache.spark.SparkException: File ./DSA.jar exists and does not match contents of http://Master:41701/jars/DSA.jar at org.apache.spark.util.Utils$.copyFile(Utils.scala:464) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:362) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:405) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:397) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:397) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
2.解决办法:
2.1 增加空间
RT
2.2 删除/移除文件文件
将work下的记录移到disk2
cd ~/disk2/backup
mv time20161102/spark-1.5.2/ .
mv time20161212/spark/work/* spark-1.5.2/work/
mv ~/cloud/spark-1.5.2/work/app-201* spark-1.5.2/work/
3.运行记录:
移除文件后可以正常运行。
参考
【1】http://spark.apache.org/docs/1.5.2/programming-guide.html【2】https://github.com/xubo245/SparkLearning
0 0
- Spark问题10之Spark运行时节点空间不足导致运行报错
- 运行spark 例子报错
- Yarn Clinet模式运行spark报错问题
- Spark 本地模式运行 磁盘空间不足
- spark-shell 运行报错 OutOfMemoryError
- Jenkins运行一段时间后导致空间不足
- spark 运行问题总结
- Spark集群运行问题
- Spark 之 运行架构
- spark -运行
- Spark 入门之三:Spark运行框架
- spark程序运行时问题
- Spark Streaming之运行原理
- Spark Streaming之运行架构
- 在spark环境中运行demo的时候报错
- Spark程序运行报错解决(1)
- spark shell 运行 README.md 报错解决
- Spark运行报错:ERROR CoarseGrainedExecutorBackend: Driver disassociated ! Shutting down
- JavaWeb-Servlet:Servlet与form与web.xml路径对应
- JAVA中XML的解析
- 一个简单的底部Tab切换实现
- [2017-AspNet-MVC4] 案例演化:加法测试-1
- Spark问题9之Spark通过JNI调用c的问题解决
- Spark问题10之Spark运行时节点空间不足导致运行报错
- Spark问题11之广播失败
- 李白打酒
- Spark问题12之kryoserializer shuffle size 不够,出现overflow
- 常用框架、库
- Spark问题13之Total size of serialized results of 30 tasks (2.0 GB) is bigger than spark.driver.maxResul
- Hanlder分析研究
- ExtJS 布局组件
- jQuery节点操作方法