ShuffleMapTask执行结果和Driver的交互原理及源码

来源:互联网 发布:来自故宫的礼物 淘宝 编辑:程序博客网 时间:2024/06/05 04:26

    ShuffleMapTask执行结果和Driver的交互原理及源码

Driver中的CoarseGrainedSchedulerBackend给CoarseGrainedExecutorBackend发送launchTasks消息,CoarseGrainedExecutorBackend收到launchTasks消息以后会调用executor.launchTask。 通过launchTask来执行Task,launchTask方法中根据传入的参数:taskId、尝试次数、任务名称, 序列化后的任务创建一个TaskRunner,在threadPool 中执行TaskRunner。 TaskRunner内部会先做一些准备工作:例如反序列化Task的依赖,通过网络来获取需要的文件、Jar等 ;然后调用反序列化后的Task.run方法来执行任务并获得执行结果:

Executor.scala源码

1.         override def run(): Unit = {

2.         …….   

3.          val value = try {

4.                   val res = task.run(

5.                     taskAttemptId = taskId,

6.                     attemptNumber = attemptNumber,

7.                     metricsSystem = env.metricsSystem)

8.                   threwException = false

9.                   res

其中Task的run方法调用的时候会导致Task的抽象方法runTask的调用,Task.scala的runTask方法是一个抽象方法。Task包括2种Task:ResultTask、ShuffleMapTask,抽象runTask方法具体的实现由子类的runTask实现。ShuffleMapTask的runTask实际运行的时候会调用RDD的iterator,然后针对partition进行计算。

ShuffleMapTask.scala源码:

1.             override def runTask(context: TaskContext):MapStatus = {

2.          ......

3.             val ser =SparkEnv.get.closureSerializer.newInstance()

4.             val (rdd, dep) = ser.deserialize[(RDD[_],ShuffleDependency[_, _, _])](

5.             ......

6.               val manager = SparkEnv.get.shuffleManager

7.               writer = manager.getWriter[Any,Any](dep.shuffleHandle, partitionId, context)

8.               writer.write(rdd.iterator(partition,context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])

9.               writer.stop(success = true).get

10.         ......

 

ShuffleMapTask方法中调用shuffleManager写入器writer方法,在write时最终计算会调用RDD的compute方法。通过writer.stop(success = true).get如果写入成功就返回MapStatus结果值。

SortShuffleWriter.scala源码:

1.         override def write(records:Iterator[Product2[K, V]]): Unit = {

2.               val blockId = ShuffleBlockId(dep.shuffleId,mapId, IndexShuffleBlockResolver.NOOP_REDUCE_ID)

3.               val partitionLengths =sorter.writePartitionedFile(blockId, tmp)

4.               shuffleBlockResolver.writeIndexFileAndCommit(dep.shuffleId,mapId, partitionLengths, tmp)

5.               mapStatus = MapStatus(blockManager.shuffleServerId,partitionLengths)

6.          ......

7.          override def stop(success: Boolean):Option[MapStatus] = {

8.               ......

9.               if (success) {

10.              return Option(mapStatus)

11.            } else {

12.              return None

13.            }

14.      ......

 

 回到TaskRunner的run方法,把task.run执行结果通过resultSer.serialize(value)序列化,生成一个directResult。然后根据大小判断不同的结果赋值给serializedResult,传回给Driver:

1)如果任务执行结果特别大的情况,超过1GB,日志提示超出任务大小限制。返回元数据ser.serialize(newIndirectTaskResult[Any](TaskResultBlockId(taskId), resultSize))

Executor.scala源码:

1.           if (maxResultSize > 0&& resultSize > maxResultSize) {

2.                     logWarning(s"Finished$taskName (TID $taskId). Result is larger than maxResultSize " +

3.                       s"(${Utils.bytesToString(resultSize)}> ${Utils.bytesToString(maxResultSize)}), " +

4.                       s"droppingit.")

5.                     ser.serialize(newIndirectTaskResult[Any](TaskResultBlockId(taskId), resultSize))    

 

2) 如果任务执行结果小于1G,大于maxDirectResultSize(128M),就放入blockManager。返回元数据ser.serialize(newIndirectTaskResult[Any](blockId, resultSize))。

Executor.scala源码:

1.         …….

2.           } else if (resultSize >maxDirectResultSize) {

3.                     val blockId =TaskResultBlockId(taskId)

4.                     env.blockManager.putBytes(

5.                       blockId,

6.                       newChunkedByteBuffer(serializedDirectResult.duplicate()),

7.                       StorageLevel.MEMORY_AND_DISK_SER)

8.                     logInfo(

9.                       s"Finished$taskName (TID $taskId). $resultSize bytes result sent via BlockManager)")

10.                  ser.serialize(newIndirectTaskResult[Any](blockId, resultSize))

            

3)如果任务执行结果小于128M,就直接返回serializedDirectResult

Executor.scala源码:

1.         …….   

2.         } else {

3.                     logInfo(s"Finished $taskName(TID $taskId). $resultSize bytes result sent to driver")

4.                     serializedDirectResult

5.         ……

            

接下来TaskRunner的run方法中调用execBackend.statusUpdate(taskId,TaskState.FINISHED, serializedResult)给Driver发送一个消息,消息中将taskId, TaskState.FINISHED,serializedResult传进去。这里execBackend是CoarseGrainedExecutorBackend。

Executor.scala源码:

1.             override def run(): Unit = {

2.          .....

3.                 execBackend.statusUpdate(taskId,TaskState.FINISHED, serializedResult)

4.         ……

 

CoarseGrainedExecutorBackend的statusUpdate方法源码如下:

1.           override def statusUpdate(taskId: Long, state:TaskState, data: ByteBuffer) {

2.             val msg =StatusUpdate(executorId, taskId, state, data)

3.             driver match {

4.               case Some(driverRef) =>driverRef.send(msg)

5.               case None =>logWarning(s"Drop $msg because has not yet connected to driver")

6.             }

7.           }

 

        CoarseGrainedExecutorBackend给DriverEndpoint发送StatusUpdate来传输执行结果,DriverEndpoint是一个ThreadSafeRpcEndpoint消息循环体,模式匹配收到StatusUpdate消息,调用scheduler.statusUpdate(taskId, state, data.value)方法执行。这里的scheduler 是TaskSchedulerImpl。

CoarseGrainedSchedulerBackend.scala的DriverEndpoint源码:

1.                 override def receive:PartialFunction[Any, Unit] = {

2.               case StatusUpdate(executorId,taskId, state, data) =>

3.                 scheduler.statusUpdate(taskId, state,data.value)

DriverEndpoint会把执行结果传递给TaskSchedulerImpl处理,交给TaskResultGetter内部通过线程去分别处理Task执行成功和失败时候的不同情况,然后告诉DAGScheduler任务处理结束的状况。

TaskSchedulerImpl.scala的statusUpdate源码:

1.         def statusUpdate(tid: Long, state: TaskState, serializedData:ByteBuffer) {

2.         ……..  

3.           if (TaskState.isFinished(state)){

4.                       cleanupTaskState(tid)

5.                       taskSet.removeRunningTask(tid)

6.                       if (state ==TaskState.FINISHED) {

7.                         taskResultGetter.enqueueSuccessfulTask(taskSet,tid, serializedData)

8.                       } else if(Set(TaskState.FAILED, TaskState.KILLED, TaskState.LOST).contains(state)) {

9.                         taskResultGetter.enqueueFailedTask(taskSet,tid, state, serializedData)

10.                    }

11.                  }

 

TaskResultGetter.scala的 enqueueSuccessfulTask方法中,开辟一条新线程处理成功任务,对结果进行相应的处理以后调用scheduler.handleSuccessfulTask。

TaskSchedulerImpl的handleSuccessfulTask的源码如下:

1.            def handleSuccessfulTask(

2.               taskSetManager: TaskSetManager,

3.               tid: Long,

4.               taskResult:DirectTaskResult[_]): Unit = synchronized {

5.             taskSetManager.handleSuccessfulTask(tid,taskResult)

6.           }

TaskSchedulerImpl的handleSuccessfulTask交给TaskSetManager调用handleSuccessfulTask。

TaskSetManager的handleSuccessfulTask源码如下:

1.            def handleSuccessfulTask(tid:Long, result: DirectTaskResult[_]): Unit = {

2.              ……

3.             sched.dagScheduler.taskEnded(tasks(index),Success, result.value(), result.accumUpdates, info)

4.           ……

5.            

handleSuccessfulTask方法中调用sched.dagScheduler.taskEnded,taskEnded由TaskSetManager调用,汇报任务完成或者失败。将任务完成的事件CompletionEvent放入eventProcessLoop事件处理循环中。

DAGScheduler.scala源码:

1.          def taskEnded(

2.               task: Task[_],

3.               reason: TaskEndReason,

4.               result: Any,

5.               accumUpdates:Seq[AccumulatorV2[_, _]],

6.               taskInfo: TaskInfo): Unit ={

7.             eventProcessLoop.post(

8.               CompletionEvent(task,reason, result, accumUpdates, taskInfo))

9.           }

 

由事件循环线程读取消息并调用DAGSchedulerEventProcessLoop.onReceive方法进行消息处理。

DAGScheduler.scala源码:

1.             override def onReceive(event:DAGSchedulerEvent): Unit = {

2.             val timerContext =timer.time()

3.             try {

4.               doOnReceive(event)

5.             } finally {

6.               timerContext.stop()

7.             }

8.           }

 

onReceive中调用doOnReceive(event)方法,模式匹配到CompletionEvent,调用dagScheduler.handleTaskCompletion方法。

DAGScheduler.scala源码:

1.           private def doOnReceive(event:DAGSchedulerEvent): Unit = event match {

2.             case JobSubmitted(jobId, rdd,func, partitions, callSite, listener, properties) =>

3.               dagScheduler.handleJobSubmitted(jobId,rdd, func, partitions, callSite, listener, properties)

4.         ......

5.           case completion: CompletionEvent=>

6.               dagScheduler.handleTaskCompletion(completion)

7.         .....

 

DAGScheduler.handleTaskCompletion中task执行成功的情况,根据ShuffleMapTask和ResultTask两种情况分别处理。其中ShuffleMapTask将MapStatus汇报给MapOutTracker。

DAGScheduler的handleTaskCompletion源码:

1.              private[scheduler] def handleTaskCompletion(event:CompletionEvent) {

2.         ......

3.           val stage =stageIdToStage(task.stageId)

4.             event.reason match {

5.               case Success =>

6.                 stage.pendingPartitions -=task.partitionId

7.                 task match {

8.         ......

9.           case smt: ShuffleMapTask =>

10.                  val shuffleStage =stage.asInstanceOf[ShuffleMapStage]

11.                  updateAccumulators(event)

12.                  val status =event.result.asInstanceOf[MapStatus]

13.                  val execId =status.location.executorId

14.                  logDebug("ShuffleMapTaskfinished on " + execId)

15.                  if(failedEpoch.contains(execId) && smt.epoch <= failedEpoch(execId)) {

16.                    logInfo(s"Ignoring possiblybogus $smt completion from executor $execId")

17.                  } else {

18.                    shuffleStage.addOutputLoc(smt.partitionId,status)

19.                  }

20.       

21.                  if(runningStages.contains(shuffleStage) &&shuffleStage.pendingPartitions.isEmpty) {

22.                    markStageAsFinished(shuffleStage)

23.                    logInfo("looking for newlyrunnable stages")

24.                    logInfo("running: " +runningStages)

25.                    logInfo("waiting: " +waitingStages)

26.                    logInfo("failed: " +failedStages)

27.       

28.                    // We supply true toincrement the epoch number here in case this is a

29.                    // recomputation ofthe map outputs. In that case, some nodes may have cached

30.                    // locations withholes (from when we detected the error) and will need the

31.                    // epoch incrementedto refetch them.

32.                    // TODO: Onlyincrement the epoch number if this is not the first time

33.                    //       we registered these map outputs.

34.                    mapOutputTracker.registerMapOutputs(

35.                      shuffleStage.shuffleDep.shuffleId,

36.                      shuffleStage.outputLocInMapOutputTrackerFormat(),

37.                      changeEpoch =true)

38.       

39.                    clearCacheLocs()

40.       

41.                    if(!shuffleStage.isAvailable) {

42.                      // Some tasks hadfailed; let's resubmit this shuffleStage

43.                      // TODO: Lower-levelscheduler should also deal with this

44.                      logInfo("Resubmitting" + shuffleStage + " (" + shuffleStage.name +

45.                        ") becausesome of its tasks had failed: " +

46.                        shuffleStage.findMissingPartitions().mkString(","))

47.                      submitStage(shuffleStage)

48.                    } else {

49.                      // Mark anymap-stage jobs waiting on this stage as finished

50.                      if(shuffleStage.mapStageJobs.nonEmpty) {

51.                        val stats =mapOutputTracker.getStatistics(shuffleStage.shuffleDep)

52.                        for (job <-shuffleStage.mapStageJobs) {

53.                          markMapStageJobAsFinished(job,stats)

54.                        }

55.                      }

56.                      submitWaitingChildStages(shuffleStage)

57.                    }

58.                  }

59.              }

 

 

阅读全文
0 0