Spark DAGScheduler模块源码解析(转自:http://guozhongxin.com/tag/spark.html)
来源:互联网 发布:mac格式化u盘为f32 编辑:程序博客网 时间:2024/06/15 20:04
Spark DAGScheduler的背景知识
Spark Application在遇到action算子时,SparkContext会生成Job,并将构成DAG图将给DAG Scheduler解析成Stage。
Stage
Stage是Spark对DAG的划分,以此作为对作业的进行任务(task)划分和调度的依据。
可以这样理解Stage不需要shuffle是可以随意并发的, 所以stage的边界就是需要shuffle的地方。
下图是一个stage例子。
Stage有两种:
ShuffleMapStage
这种Stage是以Shuffle为输出边界,其输入边界可以是从外部获取数据,也可以是另一个ShuffleMapStage的输出,其输出可以。是另一个Stage的开始ShuffleMapStage的最后Task就是ShuffleMapTask。在一个Job里可能有该类型的Stage,也可以能没有该类型Stage。
上图Stage 1,Stage 2都属于ShuffleMapStage
ResultStage
这种Stage是直接输出结果。其输入边界可以是从外部获取数据,也可以是另一个ShuffleMapStage的输出。ResultStage的最后Task就是ResultTask。在一个Job里必定有该类型Stage。一个Job含有一个或多个Stage,但至少含有一个ResultStage。
DAGScheduler
DAGScheduler主要功能如下:
接收用户提交的job;
将job根据类型划分为不同的stage,记录哪些RDD、Stage被物化,并在每一个stage内产生一系列的task,并封装成TaskSet;
决定每个Task的最佳位置(任务在数据所在的节点上运行),并结合当前的缓存情况;将TaskSet提交给TaskScheduler;
重新提交Shuffle输出丢失的Stage给TaskScheduler;
注:一个Stage内部的错误不是由shuffle输出丢失造成的,DAGScheduler是不管的,由TaskScheduler负责尝试重新提交task执行;
Spark DAGScheduler源码解析
DAGScheduler的创建是在用户定义一个新的SparkContext时进行的。(需要注意的是,在SparkContext中,TaskSchduler是在DAGScheduler之前生成的,即dagScheduler = new DAGScheduler(this)中的this.taskScheduler已经被生成,这个taskScheduler也是dagScheduler的一个成员变量) @volatile private[spark] var dagScheduler: DAGScheduler = _
try {
dagScheduler = new DAGScheduler(this)
} catch {
case e: Exception => throw
new SparkException("DAGScheduler cannot be initialized due to %s".format(e.getMessage))
}
当执行输出算子的时候,spark会调用sc.runJob()方法,例如RDD.scala中定义的count():
def count(): Long = sc.runJob(this, Utils.getIteratorSize _).sum
跟进到SparkContext.scala中的runJob()方法,可以看到:
01 def runJob[T, U: ClassTag](02 rdd: RDD[T],03 func: (TaskContext, Iterator[T]) => U,04 partitions: Seq[Int],05 allowLocal: Boolean,06 resultHandler: (Int, U) => Unit) {07 if (dagScheduler == null) {08 throw new SparkException("SparkContext has been shutdown")09 }10 val callSite = getCallSite11 val cleanedFunc = clean(func)12 logInfo("Starting job: " + callSite.shortForm)13 val start = System.nanoTime14 dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, allowLocal,15 resultHandler, localProperties.get)16 logInfo(17 "Job finished: " + callSite.shortForm + ", took " + (System.nanoTime - start) / 1e9 + " s")18 rdd.doCheckpoint()19 }
sc.runJob()是调用的dagScheduler.runJob()方法。跟进到DAGScheduler.runJob()
01 def runJob[T, U: ClassTag](02 rdd: RDD[T],03 func: (TaskContext, Iterator[T]) => U,04 partitions: Seq[Int],05 callSite: CallSite,06 allowLocal: Boolean,07 resultHandler: (Int, U) => Unit,08 properties: Properties = null)09 {10 val start = System.nanoTime11 val waiter = submitJob(rdd, func, partitions, callSite, allowLocal, resultHandler, properties)12 waiter.awaitResult() match {13 case JobSucceeded => {14 logInfo("Job %d finished: %s, took %f s".format15 (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))16 }17 case JobFailed(exception: Exception) =>18 logInfo("Job %d failed: %s, took %f s".format19 (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))20 throw exception21 }22 }
当job被正常提交时,submitJob()返回一个JobWaiter的类,并产生一个JobSubmitted的event(事件)
1 val waiter = new JobWaiter(this, jobId, partitions.size, resultHandler)
2 eventProcessActor ! JobSubmitted(
3 jobId, rdd, func2, partitions.toArray, allowLocal, callSite, waiter, properties)
4 waiter
DAGScheduler是一个生产者-消费者模型。在DAGScheduler的实例dagScheduler在SparkContext中被创建时,dagScheduler初始化了一个守候进程,用来对DAGScheduler中的各种事件进行相应。
1 private def initializeEventProcessActor() {2 // blocking the thread until supervisor is started, which ensures eventProcessActor is3 // not null before any job is submitted4 implicit val timeout = Timeout(30 seconds)5 val initEventActorReply =6 dagSchedulerActorSupervisor ? Props(new DAGSchedulerEventProcessActor(this))7 eventProcessActor = Await.result(initEventActorReply, timeout.duration).8 asInstanceOf[ActorRef]9 }
DAGSchedulerEventProcessActor这个class在DAGScheduler.scala中被定义,用来接受并处理DAGScheduler工作时产生的各种事件event,处理的方法是调用传入的dagScheduler中的方法。DAGSchedulerEventProcessActor处理的事件有:
JobSubmitted
StageCancelled
JobCancelled
JobGroupCancelled
AllJobsCancelled
ExecutorAdded
ExecutorLost
BeginEvent
GettingResultEvent
CompletionEvent
ResubmitFailedStages
以JobSubmitted事件为例:
1 case JobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite, listener, properties) =>2 dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite,3 listener, properties)
dagScheduler.handleJobSubmitted将接收到finalRDD的依赖关系解析出来,生成stages,即整个DAG的结构,再调用函数将stage内的tasks打包成TaskSet,交给taskScheduler处理。跟着这个方法,handleJobSubmitted,就可以了解DAGScheduler的主要功能和实现原理。
01 private[scheduler] def handleJobSubmitted(jobId: Int,02 finalRDD: RDD[_],03 func: (TaskContext, Iterator[_]) => _,04 partitions: Array[Int],05 allowLocal: Boolean,06 callSite: CallSite,07 listener: JobListener,08 properties: Properties = null)09 {10 var finalStage: Stage = null11 try {12 // New stage creation may throw an exception if, for example, jobs are run on a13 // HadoopRDD whose underlying HDFS files have been deleted.14 finalStage = newStage(finalRDD, partitions.size, None, jobId, callSite)15 } catch {16 case e: Exception =>17 logWarning("Creating new stage failed due to exception - job: " + jobId, e)18 listener.jobFailed(e)19 return20 }21 if (finalStage != null) {22 val job = new ActiveJob(jobId, finalStage, func, partitions, callSite, listener, properties)23 clearCacheLocs()24 logInfo("Got job %s (%s) with %d output partitions (allowLocal=%s)".format(25 job.jobId, callSite.shortForm, partitions.length, allowLocal))26 logInfo("Final stage: " + finalStage + "(" + finalStage.name + ")")27 logInfo("Parents of final stage: " + finalStage.parents)28 logInfo("Missing parents: " + getMissingParentStages(finalStage))29 val shouldRunLocally =30 localExecutionEnabled && allowLocal && finalStage.parents.isEmpty && partitions.length == 131 if (shouldRunLocally) {32 // Compute very short actions like first() or take() with no parent stages locally.33 listenerBus.post(SparkListenerJobStart(job.jobId, Seq.empty, properties))34 runLocally(job)35 } else {36 jobIdToActiveJob(jobId) = job37 activeJobs += job38 finalStage.resultOfJob = Some(job)39 val stageIds = jobIdToStageIds(jobId).toArray40 val stageInfos = stageIds.flatMap(id => stageIdToStage.get(id).map(_.latestInfo))41 listenerBus.post(SparkListenerJobStart(job.jobId, stageInfos, properties))42 submitStage(finalStage)43 }44 }45 submitWaitingStages()46 }
可以看出,DAGScheduler生成stage,是通过最后一个RDD推算出来的,(这个RDD通过sc.runJob() -> dagScheduler.runJob() -> dagScheduler.submitJob() -> JobSubmitted() -> dagScheduler.handleJobSubmitted() 层层调用传进来的)
这一行代码,
finalStage = newStage(finalRDD, partitions.size, None, jobId, callSite)
通过调用newStage()方法,生成了finalStage。实际上,newStage()中调用了getParentStages()方法,由finalRDD向前追溯,生成了parentStages。
01 private def getParentStages(rdd: RDD[_], jobId: Int): List[Stage] = {02 val parents = new HashSet[Stage]03 val visited = new HashSet[RDD[_]]04 // We are manually maintaining a stack here to prevent StackOverflowError05 // caused by recursively visiting06 val waitingForVisit = new Stack[RDD[_]]07 def visit(r: RDD[_]) {08 if (!visited(r)) {09 visited += r10 // Kind of ugly: need to register RDDs with the cache here since11 // we can't do it in its constructor because # of partitions is unknown12 for (dep <- r.dependencies) {13 dep match {14 case shufDep: ShuffleDependency[_, _, _] =>15 parents += getShuffleMapStage(shufDep, jobId)16 case _ =>17 waitingForVisit.push(dep.rdd)18 }19 }20 }21 }22 waitingForVisit.push(rdd)23 while (!waitingForVisit.isEmpty) {24 visit(waitingForVisit.pop())25 }26 parents.toList27 }
回到handleJobSubmitted(),看到27、28两行,一个是”Parents of final stage: “,这个是由getParentStages()方法获取的,而”Missing parents: “,是由getMissingParentStages获取的,在这里(handleJobSubmitted()),两者没有什么不同。但是在其他地方,调用两个函数还是会有不同效果。
01 private def getMissingParentStages(stage: Stage): List[Stage] = {02 val missing = new HashSet[Stage]03 val visited = new HashSet[RDD[_]]04 // We are manually maintaining a stack here to prevent StackOverflowError05 // caused by recursively visiting06 val waitingForVisit = new Stack[RDD[_]]07 def visit(rdd: RDD[_]) {08 if (!visited(rdd)) {09 visited += rdd10 if (getCacheLocs(rdd).contains(Nil)) {11 for (dep <- rdd.dependencies) {12 dep match {13 case shufDep: ShuffleDependency[_, _, _] =>14 val mapStage = getShuffleMapStage(shufDep, stage.jobId)15 if (!mapStage.isAvailable) {16 missing += mapStage17 }18 case narrowDep: NarrowDependency[_] =>19 waitingForVisit.push(narrowDep.rdd)20 }21 }22 }23 }24 }25 waitingForVisit.push(stage.rdd)26 while (!waitingForVisit.isEmpty) {27 visit(waitingForVisit.pop())28 }29 missing.toList30 }
由以上的代码可以看出,getMissingParentStages()与getParentStages()在第15、16行。
回到handleJobSubmitted()41、42行,DAGScheduler向监听总线发生一个JobStart的事件,之后,调用submitStage()将生成的Stage提交
01 /** Submits stage, but first recursively submits any missing parents. */02 private def submitStage(stage: Stage) {03 val jobId = activeJobForStage(stage)04 if (jobId.isDefined) {05 logDebug("submitStage(" + stage + ")")06 if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) {07 val missing = getMissingParentStages(stage).sortBy(_.id)08 logDebug("missing: " + missing)09 if (missing == Nil) {10 logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents")11 submitMissingTasks(stage, jobId.get)12 } else {13 for (parent <- missing) {14 submitStage(parent)15 }16 waitingStages += stage17 }18 }19 } else {20 abortStage(stage, "No active job for stage " + stage.id)21 }22 }
在submitMissingTasks()中,DAGScheduler将stage中的tasks进行拆分,并将tasks打包成TaskSet,交给TaskScheduler处理。
01 /* Called when stage's parents are available and we can now do its task. /02 private def submitMissingTasks(stage: Stage, jobId: Int) {03 logDebug("submitMissingTasks(" + stage + ")")04 // Get our pending tasks and remember them in our pendingTasks entry05 stage.pendingTasks.clear()06 07 ····08 09 val tasks: Seq[Task[_]] = if (stage.isShuffleMap) {10 partitionsToCompute.map { id =>11 val locs = getPreferredLocs(stage.rdd, id)12 val part = stage.rdd.partitions(id)13 new ShuffleMapTask(stage.id, taskBinary, part, locs)14 }15 } else {16 val job = stage.resultOfJob.get17 partitionsToCompute.map { id =>18 val p: Int = job.partitions(id)19 val part = stage.rdd.partitions(p)20 val locs = getPreferredLocs(stage.rdd, p)21 new ResultTask(stage.id, taskBinary, part, locs, id)22 }23 }24 25 if (tasks.size > 0) {26 // Preemptively serialize a task to make sure it can be serialized. 27 try {28 closureSerializer.serialize(tasks.head)29 } catch {30 case e: NotSerializableException =>31 abortStage(stage, "Task not serializable: " + e.toString)32 runningStages -= stage33 return34 case NonFatal(e) => // Other exceptions, such as IllegalArgumentException from Kryo.35 abortStage(stage, s"Task serialization failed: $e\n${e.getStackTraceString}")36 runningStages -= stage37 return38 }39 40 logInfo("Submitting " + tasks.size + " missing tasks from " + stage + " (" + stage.rdd + ")")41 stage.pendingTasks ++= tasks42 logDebug("New pending tasks: " + stage.pendingTasks)43 taskScheduler.submitTasks(44 new TaskSet(tasks.toArray, stage.id, stage.newAttemptId(), stage.jobId, properties))45 stage.latestInfo.submissionTime = Some(clock.getTime())46 } else {47 // Because we posted SparkListenerStageSubmitted earlier, we should post48 // SparkListenerStageCompleted here in case there are no tasks to run.49 listenerBus.post(SparkListenerStageCompleted(stage.latestInfo))50 logDebug("Stage " + stage + " is actually done; %b %d %d".format(51 stage.isAvailable, stage.numAvailableOutputs, stage.numPartitions))52 runningStages -= stage53 }54 }
接下来的工作,就交给TaskScheduler解决了。
有时间再整理一下吧
- Spark DAGScheduler模块源码解析(转自:http://guozhongxin.com/tag/spark.html)
- Spark Scheduler模块源码分析之DAGScheduler
- Spark Scheduler模块源码分析之DAGScheduler
- Spark Scheduler模块源码分析之DAGScheduler
- Spark DAGScheduler 功能及源码解析
- [spark] DAGScheduler划分stage源码解析
- [spark] DAGScheduler 提交stage源码解析
- 【Spark】DAGScheduler源码浅析
- Spark源码阅读笔记:DAGScheduler
- 【Spark】DAGScheduler源码浅析2
- Spark 源码解析 : DAGScheduler中的DAG划分与提交
- spark dagscheduler
- spark源码学习(五)--- DAGScheduler中的stage的划分
- spark源码学习(六)--- DAGScheduler中的task的划分
- [Spark源码剖析] DAGScheduler划分stage
- [Spark源码剖析] DAGScheduler提交stage
- (八)Spark源码理解之DAGScheduler---part1
- (八)Spark源码理解之DAGScheduler---part2
- LeetCode -- Find the Duplicate Number
- 使用SIGALRM信号为阻塞操作设置超时
- homerHEVC代码阅读(8)——基础结构之henc_thread_t
- static变量与普通变量的区别
- Codeforces 600A Extract Numbers 【模拟】
- Spark DAGScheduler模块源码解析(转自:http://guozhongxin.com/tag/spark.html)
- hpuoj 1722: 感恩节KK专场——与学妹滑雪 (最短路&精度)
- 浅谈c#委托的四种用法及lambda匿名委托
- uva 562 Dividing coins 01背包
- spring 使用classpath方式加载hibernate映射文件
- Eigen+suitesparse for windows 安装
- Ceph 基本数据结构(1)-object
- 感受、情感、体验是私有的
- HTTP Content-type