Spark1.3从创建到提交:6)Executor和Driver互动源码分析
来源:互联网 发布:万能数据恢复破解版 编辑:程序博客网 时间:2024/05/17 02:38
上一节介绍了worker启动了一个名为CoarseGrainedExecutorBackend的进程,首先看下CoarseGrainedExecutorBackend类的main方法
def main(args: Array[String]) {//定义变量接收args命令行参数var driverUrl: String = nullvar executorId: String = nullvar hostname: String = nullvar cores: Int = 0var appId: String = nullvar workerUrl: Option[String] = Noneval userClassPath = new mutable.ListBuffer[URL]()//变量赋值...略过run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)}主要跟踪下run方法
private def run( driverUrl: String, executorId: String, hostname: String, cores: Int, appId: String, workerUrl: Option[String], userClassPath: Seq[URL]) { //只保留核心代码... SparkHadoopUtil.get.runAsSparkUser { () => // Debug code Utils.checkHost(hostname) // Bootstrap to fetch the driver's Spark properties. val executorConf = new SparkConf val port = executorConf.getInt("spark.executor.port", 0) //创建ActorSystem,其用来获取driverActor的代理 val (fetcher, _) = AkkaUtils.createActorSystem("driverPropsFetcher", hostname, port , executorConf, new SecurityManager(executorConf)) //通过actorSystem获得driverActor的代理 val driver = fetcher.actorSelection(driverUrl) //创建ExecutorEnv,其实又创建了一个actorSystem val env = SparkEnv.createExecutorEnv( driverConf, executorId, hostname, port, cores, isLocal = false) //使用actorSystem实例化CoarseGrainedExecutorBackend这个Actor(此时其生命周期被调用) env.actorSystem.actorOf( Props(classOf[CoarseGrainedExecutorBackend], driverUrl, executorId, sparkHostPort, cores, userClassPath, env), name = "Executor") //... env.actorSystem.awaitTermination() } }里面创建了2个Actor,一个是driverActor代理(和driver通信使用),一个是真正的executorActor(CoarseGrainedExecutorBackend)。接下来看下CoarseGrainedExecutorBackend生命周期方法preStart
override def preStart() { logInfo("Connecting to driver: " + driverUrl) driver = context.actorSelection(driverUrl) driver ! RegisterExecutor(executorId, hostPort, cores, extractLogUrls) context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent]) }其通过driverUrl获得了driveActor的代理,并向driver发送了注册executor的消息(在第2节也提及到了暂时没有用到的DriverActor),下面看下DriverActor类的receiveWithLogging方法( org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.DriverActor)
case RegisterExecutor(executorId, hostPort, cores, logUrls) =>//只保留核心代码Utils.checkHostPort(hostPort, "Host port expected " + hostPort)if (executorDataMap.contains(executorId)) { sender ! RegisterExecutorFailed("Duplicate executor ID: " + executorId)} else { logInfo("Registered executor: " + sender + " with ID " + executorId) //通知executor注册成功 sender ! RegisteredExecutor listenerBus.post(SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data)) //查看当前是否有任务需要提交(driver端->executor端) makeOffers()}
makeOffers方法如下(之后再分析)
// Make fake resource offers on all executors def makeOffers() { launchTasks(scheduler.resourceOffers(executorDataMap.map { case (id, executorData) => new WorkerOffer(id, executorData.executorHost, executorData.freeCores) }.toSeq)) }接下来,看下executor接收到driver端发来的注册成功的消息,CoarseGrainedExecutorBackend.receiveWithLogging
case RegisteredExecutor => logInfo("Successfully registered with driver") val (hostname, _) = Utils.parseHostPort(hostPort) executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)代码很简单,executor注册成功后。创建了一个Executor实例,Executor类如下
private[spark] class Executor( executorId: String, executorHostname: String, env: SparkEnv, userClassPath: Seq[URL] = Nil, isLocal: Boolean = false) extends Logging{//只保留我们关心的代码// Start worker thread poolval threadPool = Utils.newDaemonCachedThreadPool("Executor task launch worker")//Create an actor for receiving RPCs from the driverprivate val executorActor = env.actorSystem.actorOf(Props(new ExecutorActor(executorId)), "ExecutorActor")//send heart beater to driver(executor->driver)startDriverHeartbeater()//启动一个任务def launchTask(context: ExecutorBackend,taskId: Long,attemptNumber: Int,taskName: String,serializedTask: ByteBuffer) {//把当前的任务封装成TaskRunnerval tr = new TaskRunner(context, taskId = taskId, attemptNumber = attemptNumber, taskName, serializedTask)runningTasks.put(taskId, tr)//使用线程池来执行这个任务threadPool.execute(tr)}}
Executor构造器中创建了一个可变线池来执行任务,同时向driver发送心跳更新任务的运行状态。
0 0
- Spark1.3从创建到提交:6)Executor和Driver互动源码分析
- Spark1.3从创建到提交:5)Executor启动源码分析
- Spark1.3从创建到提交:9)Stage的划分和提交源码分析
- Spark1.3从创建到提交:1)master和worker启动流程源码分析
- Spark1.3从创建到提交:2)spark-submit和SparkContext源码分析
- Spark1.3从创建到提交:10)任务提交源码分析
- Spark1.3从创建到提交:3)任务调度初始化源码分析
- Spark1.3从创建到提交:4)资源分配源码分析
- Spark1.3从创建到提交:7)SparkContext.runJob源码分析
- Spark1.3从创建到提交:8)DAGScheduler.runJob源码分析
- Spark源码分析之worker节点启动driver和executor
- Spark1.6源码之Task任务提交源码分析
- SparkSubmit 提交作业源码流程粗略概述(含application中 driver、client、 executor的创建)
- Spark1.6.3 Driver端 task调度源码分析
- Spark1.6.3 Driver端 task运行完成源码分析
- spark源码学习(二)---Master源码分析(3)-master对driver、executor的调度
- Spark源码分析之Driver的分配启动和executor的分配启动
- Spark2.2 Driver和Executor状态改变处理机制源码分析
- 模拟实现strlen
- 事务四大特征:原子性,一致性,隔离性和持久性(ACID)
- AndroidStudio导入一个已存在项目的步骤
- Netty 简单介绍
- 无题
- Spark1.3从创建到提交:6)Executor和Driver互动源码分析
- 【leetcode】101. Symmetric Tree【java】递归和非递归两种方法
- Android 运行时权限进阶(听课笔记)
- bzoj 3100 K大数查询 树套树
- 微观经济学的学习和理解
- eclipse/myeclipse清除workspace
- 最简真分数
- Android(UI)布局(文件)控件标签的通用属性
- Union-Find C语言实现