为什么说coarseGrainedExecutorBackend要通信的对象driverUrl是driverEndpoint而不是ClientEndpoint

来源:互联网 发布:台湾谈大陆2016网络 编辑:程序博客网 时间:2024/04/30 08:55

背景:集群启动的时候启动了master和worker

用户提交程序时:

1. 首先new spark context,其中会new dagScheduler,taskSchedulerImpl和sparkDeploySchedulerBackend,并start taskSchedulerImpl。

2. 在taskSchedulerImpl start时,会start schedulerBackend,在standalone模式下是start了sparkDeoplySchedulerBackend。

3. 在start sparkDeoploySchedulerBackend时会先start它的父类CoarseGrainedSchedulerBackend(start时会初始化DriverEndpoint这个内部类,该内部类是rpcEndpoint,它有onstart方法,在该方法执行时会执行Option(self).foreach(_.send(ReviveOffers))来周期性地发ReviveOffers消息给自己,ReviveOffers是个空的object,会触发makeOffers‘Make fake resource offers on all executors’.

4. SparkDeploySchedulerBackend的start方法在调用其父类CoarseGrainedScheduleBackend的start方法启动了DriverEndpoint后,接下来会new AppClient并start AppClient,在start AppClient中会new ClientEndpoint(rpcEnv)这个内部类,该内部类是个消息通讯体,在实例化完成后会自动执行其onstart方法,onstart()内部会发消息给master来注册app(其实质是发送消息给所有masters,一旦跟一个master连接成功,就cancel与其他master的连接):masterRef.send(RegisterApplication(appDescription, self))。需要注意的是:这里的appDescription包含了app的具体信息,包括command信息;这里的self是ClientEndpoint本身。.

5. master本身是个ThreadSafeRpcEndpoint消息通讯体,接受到来自ClientEndpoint的消息RegisterApplication(description, driver)后,会createApplication(description, driver)和registerApplication(app)来创建和这册Application,并发送Application注册成功的消息给driver:driver.send(RegisteredApplication(app.id, self))(注意:这里的driver其实是ClientEndpoint!),然后调用schedule()方法。

6. clientEndpoint接受到RegisteredApplication(appId_, masterRef)消息后,会调用master = Some(masterRef)和listener.connected(appId.get),(后者实质是调用AppClientListener的具体实现SparkDeploySchedulerBackend.connected(appId.get)),至此clientEndpoint获得了注册成功了的Application的ID和Master的地址。

7. master在注册完Application后接下来会调用schedule()方法,在schedule()方法中会调用startExecutorsOnWorkers(),在startExecutorsOnWorkers()方法中会调用scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)和allocateWorkerResourceToExecutors(app, assignedCores(pos), coresPerExecutor, usableWorkers(pos)),在allocateWorkerResourceToExecutors()中会launchExecutor(worker, exec),仔细看这个launchExecutor(worker: WorkerInfo, exec: ExecutorDesc),它发送了如下两条消息:

 worker.endpoint.send(LaunchExecutor(masterUrl, exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))

exec.application.driver.send(ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))

8. 上述master发送的第一条消息是发给worker让其laucn executor的,worker本身是个消息通讯体,其在接受到消息LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) 后,创建并start了一个(ExecutorRunnerManages the execution of one executor process.),而后发ExecutorStateChanged消息给master通知Master executor状态的变化 :sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))。

9. 上述master发送的第二条消息是发送给ClientEndpoint这个消息通讯体通知它获得了executor,ClientEndpoint在接受到 ExecutorAdded(id: Int, workerId: String, hostPort: String, cores: Int, memory: Int)消息后会调用listener.executorAdded(fullId, workerId, hostPort, cores, memory),实质是调用AppClientListener的具体实现SparkDeploySchedulerBackend.executorAdded(fullId: String, workerId: String, hostPort: String, cores: Int,memory: Int),到此executor注册成功.

 

10. 我们来仔细看下worker是怎么launchExecutor的,worker创建了ExecutorRunner,然后调用了ExecutorRunner的start()方法,该start()方法调用了方法fetchAndRunExecutor(),这个fetchAndRunExecutor()方法中有以下代码:

val builder = CommandUtils.buildProcessBuilder(appDesc.command, new SecurityManager(conf),memory, sparkHome.getAbsolutePath, substituteVariables)

process = builder.start()

这里就是构建并启动新的进程的关键之所在!所有的要启动的新的进程的相关信息都在这个builder里!!!我们来看下它都有哪些信息,以及从哪里来的

这个ExecutorRunnerappDesc.command来自于workermaser接受的case class消息LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_)中的appDesc,master的这个appDesc来自于它从clientEndpoint接受的case class消息RegisterApplication(appDescription: ApplicationDescription, driver: RpcEndpointRef)中的appDescription, clientEndpointappDescription则来自于AppClient实例化时从sparkDeploySchedulerBackend中传入的appDesc,该appDesc包含了command:

val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,appUIAddress, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor, initialExecutorLimit)

val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend", args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)

我们可以看到command包含了要启动的进程的名字CoarseGrainedExecutorBackend,也包含了args参数, args参数内容如下:

   val args = Seq(

      "--driver-url", driverUrl,

      "--executor-id", "{{EXECUTOR_ID}}",

      "--hostname", "{{HOSTNAME}}",

      "--cores", "{{CORES}}",

      "--app-id", "{{APP_ID}}",

      "--worker-url", "{{WORKER_URL}}")

这里的driverUrl其内容是:

   // The endpoint for executors to talk to us

    val driverUrl = RpcEndpointAddress(

      sc.conf.get("spark.driver.host"),

      sc.conf.get("spark.driver.port").toInt,

      CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString

我们可以看到,driverUrl在这里对应的endPoint名字是CoarseGrainedSchedulerBackend.ENDPOINT_NAME,其内容实质是“CoarseGrainedScheduler”。至此一切真相大白,CoarseGrainedExecutorBackend进程启动时接受到的以后要通信的对象driverUrl就是由sparkDeploySchedulerBackend在这里设定的,其endPoint名字是“CoarseGrainedScheduler“,由于driverEndpointrpcEnv中注册的Endpoint名字是“CoarseGrainedScheduler”,而clientEndpointrpcEnv中注册的Endpoint名字是'AppClient',所以我们说CoarseGrainedExecutorBackend要通信的对象是driverEndpoint,而不是clientEndpoint

注意:clientEndpoint在rpcEnv中注册时的Endpoint名字是'AppClient',如下源码:

  def start() {

    // Just launch an rpcEndpoint; it will call back into the listener.

    endpoint.set(rpcEnv.setupEndpoint("AppClient", new ClientEndpoint(rpcEnv)))

  }

注意:driverEndpoint在rpcEnv中注册时的Endpoint名字是'CoarseGrainedScheduler',如下源码:

driverEndpoint = rpcEnv.setupEndpoint(ENDPOINT_NAME, createDriverEndpoint(properties))

这里的ENDPOINT_NAME来自以下源码:

private[spark] object CoarseGrainedSchedulerBackend {

  val ENDPOINT_NAME = "CoarseGrainedScheduler"

}

 

 

0 0
原创粉丝点击