spark 错误id意义

来源:互联网 发布:三案始末 知乎 编辑:程序博客网 时间:2024/06/08 12:48

可以推广到其他,由系统的错误id类报错找到问题的所在。



1 command exit  code 137  

1 为系统资源不够 ,从节点spark内存超出实际系统内存 系统杀死excutor进程造成的错误

2可以用ulimit -a  看哪个系统限制可以调整  ulimit -v unlimited



2 Too many open files

设置最大文件数限制

 ulimit  -n  124000

/etc/security/limits.conf中soft和hard是限制修改


3 exit code 255 

status of 255 error
错误类型:
java.io.IOException: Task process exit with nonzero status of 255.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

错误原因:
Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to higher value. By default, their values are 24 hours. These might be the reason for failure, though I'm not sure
参考这里:http://grepalex.com/2012/11/12/hadoop-logging/

http://218.245.3.161/2013/03/31/5965




413/11/07 08:06:17 INFO cluster.ClusterTaskSetManager: Loss was due to org.apache.spark.SparkException
org.apache.spark.SparkException: Error communicating with MapOutputTracker
        at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:84)
        at org.apache.spark.MapOutputTracker.getServerStatuses(MapOutputTracker.scala:170)
        at org.apache.spark.BlockStoreShuffleFetcher.fetch(BlockStoreShuffleFetcher.scala:39)
        at org.apache.spark.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:125)
        at org.apache.spark.CoGroupedRDD$$anonfun$compute$2.apply(CoGroupedRDD.scala:116)
        at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
        at scala.collection.immutable.List.foreach(List.scala:76)


源码在 

private[spark] class MapOutputTracker extends Logging {


  private val timeout = Duration.create(System.getProperty("spark.akka.askTimeout", "10").toLong, "seconds")
  
  // Set to the MapOutputTrackerActor living on the driver
  var trackerActor: ActorRef = _


  private var mapStatuses = new TimeStampedHashMap[Int, Array[MapStatus]]


  // Incremented every time a fetch fails so that client nodes know to clear
  // their cache of map output locations if this happens.
  private var epoch: Long = 0
  private val epochLock = new java.lang.Object


  // Cache a serialized version of the output statuses for each shuffle to send them out faster
  var cacheEpoch = epoch
  private val cachedSerializedStatuses = new TimeStampedHashMap[Int, Array[Byte]]


  val metadataCleaner = new MetadataCleaner("MapOutputTracker", this.cleanup)


  // Send a message to the trackerActor and get its result within a default timeout, or
  // throw a SparkException if this fails.
  def askTracker(message: Any): Any = {
    try {
      val future = trackerActor.ask(message)(timeout)
      return Await.result(future, timeout)
    } catch {
      case e: Exception =>
        throw new SparkException("Error communicating with MapOutputTracker", e)
    }
  }



尝试 调大  -Dspark.akka.askTimeout 





原创粉丝点击