Executor到底是什么时候启动的?
来源:互联网 发布:保定网络舆情日报 编辑:程序博客网 时间:2024/05/02 04:17
6.4.1 Executor到底是什么时候启动的?
在SparkContext启动之后,StandaloneSchedulerBackend中会new出一个StandaloneAppClient,StandaloneAppClient中有一个名叫ClientEndPoint的内部类,在创建ClientEndpoint会传入Command来指定具体为当前应用程序启动的Executor进行的入口类的名称为CoarseGrainedExecutorBackend ClientEndPoint继承自ThreadSafeRpcEndpoint,其通过RPC机制完成和Master的通信。在ClientEndPoint的start方法中,会通过registerWithMaster方法向Master发送RegisterApplication请求,Master收到该请求消息之后,首先通过registerApplication方法完成信息登记,之后将会调用schedule方法,在Worker上启动Executor,Master对RegisterApplication请求处理源代码如下所示。
Master.scala源码:
1. caseRegisterApplication(description, driver) =>
2. // TODO Prevent repeatedregistrations from some driver
3. //Master处于STANDBY(备用)状态,不作处理
4. if (state ==RecoveryState.STANDBY) {
5. // ignore, don't sendresponse
6. } else {
7. logInfo("Registeringapp " + description.name)
8. //由description描述,构建ApplicationInfo
9. val app =createApplication(description, driver)
10. registerApplication(app)
11. logInfo("Registeredapp " + description.name + " with ID " + app.id)
12. //在持久化引擎中加入application
13. persistenceEngine.addApplication(app)
14. driver.send(RegisteredApplication(app.id,self))
15. //调用schedule方法,在worker节点上启动Executor
16. schedule()
17. }
在上面的代码中,Master匹配到RegisterApplication请求,先判断Master的状态是否为STANDBY(备用)状态,如果不是说明Master为ALIVE状态,在这种状态下,调用createApplication(description,sender)方法创建ApplicationInfo,完成之后调用persistenceEngine.addApplication(app)方法,将新创建的ApplicationInfo持久化,以便错误恢复。完成这两步操作之后,通过driver.send(RegisteredApplication(app.id,self))向StandaloneAppClient返回注册成功后ApplicationInfo的Id和master的url地址。
ApplicationInfo对象是对application的描述,先来看一下createApplication这个方法的源代码,源代码如下所示。
Master.scala源码:
1. private def createApplication(desc:ApplicationDescription, driver: RpcEndpointRef):
2. ApplicationInfo = {
3. //ApplicationInfo创建时间
4. val now =System.currentTimeMillis()
5. val date = new Date(now)
6. //由date生成application id
7. val appId =newApplicationId(date)
8. //创建ApplicationInfo
9. new ApplicationInfo(now,appId, desc, date, driver, defaultCores)
10. }
上面代码中,createApplication方法接收ApplicationDescription和ActorRef两种类型的参数。并调用newAppicationId方法生成appId,关键代码如下所示。
1. val appId = "app-%s-%04d".format(createDateFormat.format(submitDate), nextAppNumber)
由代码所决定,appid的格式形如:app-20160429101010-0001。在desc这个对象中,包含一些基本的配置,包括从系统中传入的一些配置信息,如appname、maxCores、memoryPerExecutorMB等。最后使用desc、date、driver、defaultCores等做为参数构造一个ApplicatinInfo对象并返回。函数返回之后,调用registerApplication方法,完成application的注册,该方法是如何完成注册的呢?方法代码如下所示。
Master.scala源码:
1. private def registerApplication(app:ApplicationInfo): Unit = {
2. //Driver的地址,用于Master和Driver通信
3. val appAddress =app.driver.address
4. //如果addressToApp中已经有了该Driver地址,说明该Driver已经注册过了,直接return
5.
6. if(addressToApp.contains(appAddress)) {
7. logInfo("Attempted tore-register application at same address: " + appAddress)
8. return
9. }
10. //向度量系统注册
11. applicationMetricsSystem.registerSource(app.appSource)
12. //apps是一个HashSet,保存数据不能重复,向HashSet中加入app
13. apps += app
14. //idToApp是一个HashMap,该HashMap用于保存id和app的对应关系
15. idToApp(app.id) = app
16. //endpointToApp是一个HashMap, driver和app的对应关系
17. endpointToApp(app.driver) =app
18. //addressToApp是一个HashMap,记录app Driver的地址和app的对应关系
19. addressToApp(appAddress) = app
20. /waitingApps是一个数组,记录等待调度的app记录
21. waitingApps += app
22. if (reverseProxy) {
23. webUi.addProxyTargets(app.id,app.desc.appUiUrl)
24. }
25. }
上面代码中,首先通过app.driver.path.address得到driver的地址,然后查看appAdress映射表中是否已经存在这个路径,如果存在表示该application已经注册,直接返回;如果不存在,则在waitingApps数组中加入该application,同时在idToApp、endpointToApp、addressToApp映射表中加入映射关系。加入waitingApps数组中的application等待schedule方法的调度。
schedule方法有两个作用,第一,完成Driver的调度,将waitingDrivers数组中的Driver发送到满足运行条件的Worker上运行。第二,在满足条件的Worker节点上为application启动Executor。schedule方法源代码如下所示。
Master.scala的schedule方法源码:
1. private def schedule(): Unit = {
2. …….
3. launchDriver(worker, driver)
4. …….
5. startExecutorsOnWorkers()
6. }
在Master中,schedule方法是一个很重要的方法,每一次新的Driver的注册application的注册或者可用资源发生变动,都将调用schedule方法。Schedule方法用于为当前等待调度的application调度可用的资源,在满足条件的Worker节点上启动Executor。这个方法还有另外一个作用,就是当有Driver提交的时候,负责将Driver发送到一个可用资源满足Driver需求的Worker节点上运行,launchDriver(worker,driver)方法负责完成这一任务。
application调度成功之后,Master将会为appication在Worker节点上启动Executors,调用startExecutorsOnWorkers方法完成此操作,其源代码如下所示。
Master.scala源码:
1. privatedef startExecutorsOnWorkers(): Unit = {
2. // Right now this is a verysimple FIFO scheduler. We keep trying to fit in the first app
3. // in the queue, then thesecond app, etc.
4. for (app <- waitingApps ifapp.coresLeft > 0) {
5. val coresPerExecutor:Option[Int] = app.desc.coresPerExecutor
6. // Filter out workers thatdon't have enough resources to launch an executor
7. val usableWorkers =workers.toArray.filter(_.state == WorkerState.ALIVE)
8. .filter(worker =>worker.memoryFree >= app.desc.memoryPerExecutorMB &&
9. worker.coresFree >=coresPerExecutor.getOrElse(1))
10. .sortBy(_.coresFree).reverse
11. val assignedCores =scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)
12.
13. // Now that we've decidedhow many cores to allocate on each worker, let's allocate them
14. for (pos <- 0 untilusableWorkers.length if assignedCores(pos) > 0) {
15. allocateWorkerResourceToExecutors(
16. app, assignedCores(pos), coresPerExecutor,usableWorkers(pos))
17. }
18. }
19. }
在scheduleExecutorsOnWorkers方法中,有两种启动Executor的策略,第一种是轮流均摊策略(round-robin),采用圆桌算法依次轮流均摊,直到满足资源需求,轮流均摊策略通常会有更好的数据本地性,因此它是默认的选择策略。第二种是依次全占,在usableWorkers中,依次获取每个Worker上的全部资源,直到满足资源需求。
scheduleExecutorsOnWorkers方法为application分配好逻辑意义上的资源后,还不能真正在Worker 节点为application 分配出资源,需要调用动作函数为application真正的分配资源,allocateWorkerResourceToExecutors 方法的调用,将会在Worker节点上实际分配资源,下面是allocateWorkerResourceToExecutors的源代码。
Master.scala源码:
1. private def allocateWorkerResourceToExecutors(
2. ……
3. launchExecutor(worker, exec)
4. …….
上面代码调用了launchExecutor(worker,exec)方法,这个方法有两个参数,第一个参数是满足条件的WorkerInfo信息,第二个参数是描述Executor的ExecutorDesc对象。这个方法将会向Worker节点发送LaunchExecutor的请求,Worker节点收到该请求之后,将会负者启动Executor。launchExecutor方法代码清单如下所示。
Master.scala源码:
1. private def launchExecutor(worker: WorkerInfo,exec: ExecutorDesc): Unit = {
2. logInfo("Launchingexecutor " + exec.fullId + " on worker " + worker.id)
3. //向WorkerInfo中加入exec这个描述Executor的ExecutorDesc对象
4. worker.addExecutor(exec)
5. //向worker发送LaunchExecutor消息,加载Executor消息中携带了masterUrl地址、application id、Executor id、Executor描述desc、Executor核的个数、Executor分配的内存大小
6.
7. worker.endpoint.send(LaunchExecutor(masterUrl,
8. exec.application.id,exec.id, exec.application.desc, exec.cores, exec.memory))
9. //向Driver发回ExecutorAdded消息,消息携带worker的id号,worker的host和port,分配的核的个数和内存大小
10. exec.application.driver.send(
11. ExecutorAdded(exec.id,worker.id, worker.hostPort, exec.cores, exec.memory))
12. }
launchExecutor有两个参数,第一个参数是worker:WorkerInfo,代表着Worker的基本信息,第二个参数是exec:ExecutorDesc,这个参数保存了Executor的基本配置信息,如memory、cores等。此方法中,有worker.endpoint.send(LaunchExecutor(...)),向Worker发送LaunchExecutor请求,Worker收到该请求之后将会调用方法启动Executor。
向Worker发送LaunchExecutor消息的同时,通过exec.application.driver.send(ExecutorAdded(…))向Driver发送ExecutorAdded消息,该消息为Driver反馈Master都在哪些Worker上启动了Executor,Executor的编号是多少,为每个Executor分配了多少个核,多大的内存以及Worker的联系hostport等消息。
Worker收到LaunchExecutor消息会做相应的处理,在Worker节点中,LaunchExecutor处理逻辑源代码如下所示。
Worker.scala源码:
1. caseLaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
2. //若masterUrl和activeMasterUrl不是同一个url,说明非法的Master尝试加载Executor,打印错误信息
3. if (masterUrl != activeMasterUrl) {
4. logWarning("InvalidMaster (" + masterUrl + ") attempted to launch executor.")
5. } else {
6. try {
7. logInfo("Asked tolaunch executor %s/%d for %s".format(appId, execId, appDesc.name))
8.
9. //在workDir/appId/目录下创建以execId为名的Executor工作目录
10. val executorDir = newFile(workDir, appId + "/" + execId)
11. //调用mkdirs创建目录
12. if(!executorDir.mkdirs()) {
13. throw newIOException("Failed to create directory " + executorDir)
14. }
15.
16. //为Executor创建本地目录,该目录通过变量SPARK_EXECUTOR_DIRS设置并传递,该目录在application运行结束时由Worker负责删除
17. val appLocalDirs =appDirectories.getOrElse(appId,
18. Utils.getOrCreateLocalRootDirs(conf).map{ dir =>
19. val appDir =Utils.createDirectory(dir, namePrefix = "executor")
20. Utils.chmod700(appDir)
21. appDir.getAbsolutePath()
22. }.toSeq)
23. //在哈希表appDirectories中加入appId和appLocalDirs的对应关系
24. appDirectories(appId) =appLocalDirs
25. //创建ExecutorRunner
26. val manager = newExecutorRunner(
27. appId,
28. execId,
29. appDesc.copy(command =Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
30. cores_,
31. memory_,
32. self,
33. workerId,
34. host,
35. webUi.boundPort,
36. publicAddress,
37. sparkHome,
38. executorDir,
39. workerUri,
40. conf,
41. appLocalDirs,ExecutorState.RUNNING)
42. //在哈希表executors中加入appId+”/”+execId和ExecutorRunner的对应关系
43. executors(appId + "/" + execId)= manager
44. //启动ExecutorRunner
45. manager.start()
46. coresUsed += cores_
47. //Worker上已经使用的核增加cores_个,cores_分配给Executor的核的个数
48. memoryUsed += memory_
49. //向Master发送ExecutorStateChanged消息,该消息携带appId,exeId,ExecutorRunner的状态
50. sendToMaster(ExecutorStateChanged(appId,execId, manager.state, None, None))
51. } catch {
52. case e: Exception =>
53. logError(s"Failedto launch executor $appId/$execId for ${appDesc.name}.", e)
54. if (executors.contains(appId + "/"+ execId)) {
55. executors(appId +"/" + execId).kill()
56. executors -= appId +"/" + execId
57. }
58. sendToMaster(ExecutorStateChanged(appId,execId, ExecutorState.FAILED,
59. Some(e.toString), None))
60. }
61. }
上面代码中,首先判断传过来的masterUrl是否和activeMasterUrl相同,如果不相同,说明收到的不是处于ALIVE状态的Master发送过来的请求,这种情况直接打印警告信息。如果相同,则说明该请求来自ALIVE Master,于是为Executor创建工作目录,创建好工作目录之后,使用appid、execid、appDes等参数创建ExecutorRunner,顾名思义,ExecutorRunner是Executor运行的地方,在ExecutorRunner中,有一个工作线程,这个线程负责下载依赖的文件,并启动CoarseGaindExecutorBackend进程,该进程单独在一个JVM上运行。下面是ExecutorRunner中的线程启动的源代码。
ExecutorRunner.scala源码:
1. private[worker] def start() {
2. //创建线程
3. workerThread = newThread("ExecutorRunner for " + fullId) {
4. //线程run方法中调用fetchAndRunExcutor
5. override def run() {fetchAndRunExecutor() }
6. }
7. //启动线程
8. workerThread.start()
9.
10. // 终止回调函数,用于杀死进程
11. shutdownHook =ShutdownHookManager.addShutdownHook { () =>
12. // It's possible that wearrive here before calling `fetchAndRunExecutor`, then `state` will
13. // be `ExecutorState.RUNNING`.In this case, we should set `state` to `FAILED`.
14. if (state ==ExecutorState.RUNNING) {
15. state =ExecutorState.FAILED
16. }
17. killProcess(Some("Worker shuttingdown")) }
18. }
上面代码中,定义了一个Thread,这个Thread的run方法中调用fetchAndRunExecutor方法,fetchAndRunExecutor负责以进程的方式启动ApplicationDescription中携带的org.apache.spark.executor.CoarseGrainedExecutorBackend进程。fetchAndRunExecutor方法源代码如下所示。
ExecutorRunner.scala源码:
1. private def fetchAndRunExecutor() {
2. try {
3. // Launch the process
4. val builder =CommandUtils.buildProcessBuilder(appDesc.command, new SecurityManager(conf),
5. memory,sparkHome.getAbsolutePath, substituteVariables)
6. val command =builder.command()
7. val formattedCommand =command.asScala.mkString("\"", "\" \"","\"")
8. logInfo(s"Launchcommand: $formattedCommand")
9.
10. builder.directory(executorDir)
11. builder.environment.put("SPARK_EXECUTOR_DIRS",appLocalDirs.mkString(File.pathSeparator))
12. // In case we are runningthis from within the Spark Shell, avoid creating a "scala"
13. // parent process for theexecutor command
14. builder.environment.put("SPARK_LAUNCH_WITH_SCALA","0")
15.
16. // Add webUI log urls
17. val baseUrl =
18. if(conf.getBoolean("spark.ui.reverseProxy", false)) {
19. s"/proxy/$workerId/logPage/?appId=$appId&executorId=$execId&logType="
20. } else {
21. s"http://$publicAddress:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType="
22. }
23. builder.environment.put("SPARK_LOG_URL_STDERR",s"${baseUrl}stderr")
24. builder.environment.put("SPARK_LOG_URL_STDOUT",s"${baseUrl}stdout")
25.
26. process = builder.start()
27. val header = "SparkExecutor Command: %s\n%s\n\n".format(
28. formattedCommand, "=" * 40)
29.
30. // Redirect its stdout andstderr to files
31. val stdout = newFile(executorDir, "stdout")
32. stdoutAppender =FileAppender(process.getInputStream, stdout, conf)
33.
34. val stderr = newFile(executorDir, "stderr")
35. Files.write(header, stderr,StandardCharsets.UTF_8)
36. stderrAppender =FileAppender(process.getErrorStream, stderr, conf)
37.
38. // Wait for it to exit;executor may exit with code 0 (when driver instructs it to shutdown)
39. // or with nonzero exit code
40. val exitCode =process.waitFor()
41. state = ExecutorState.EXITED
42. val message = "Commandexited with code " + exitCode
43. worker.send(ExecutorStateChanged(appId,execId, state, Some(message), Some(exitCode)))
44. } catch {
45. case interrupted:InterruptedException =>
46. logInfo("Runnerthread for executor " + fullId + " interrupted")
47. state =ExecutorState.KILLED
48. killProcess(None)
49. case e: Exception =>
50. logError("Errorrunning executor", e)
51. state =ExecutorState.FAILED
52. killProcess(Some(e.toString))
53. }
54. }
其中fetchAndRunExecutor()方法中的CommandUtils.buildProcessBuilder(appDesc.command,传入的入口类是:"org.apache.spark.executor.CoarseGrainedExecutorBackend",当Worker节点中启动ExecutorRunner时,ExecutorRunner中会启动CoarseGrainedExecutorBackend进程,在CoarseGrainedExecutorBackend的onStart方法中,向Driver发出RegisterExecutor注册请求。
CoarseGrainedExecutorBackend的onStart方法源码:
1. override def onStart() {
2. …….
3. driver = Some(ref)
4. //向driver发送ask请求,等待driver的回应
5. ref.ask[Boolean](RegisterExecutor(executorId,self, hostname, cores, extractLogUrls))
6. ……
Driver端收到注册请求,将会注册Executor的请求,
CoarseGrainedSchedulerBackend.scala的 receiveAndReply方法源码:
1. override def receiveAndReply(context:RpcCallContext): PartialFunction[Any, Unit] = {
2.
3. caseRegisterExecutor(executorId, executorRef, hostname, cores, logUrls) =>
4. …….
5. executorRef.send(RegisteredExecutor)
6. ……
如上面代码所示,Driver向CoarseGrainedExecutorBackend发送RegisteredExecutor消息后,CoarseGrainedExecutorBackend收到RegisteredExecutor消息后将会新建一个Executor执行器,并为此Executor充当信使,与Driver通信。CoarseGrainedExecutorBackend收到RegisteredExecutor消息源代码如下所示。
CoarseGrainedExecutorBackend.scala的receive源码:
1. override def receive:PartialFunction[Any, Unit] = {
2. case RegisteredExecutor =>
3. logInfo("Successfullyregistered with driver")
4. try {
5. //收到RegisteredExecutor消息,立即创建Executor
6. executor = newExecutor(executorId, hostname, env, userClassPath, isLocal = false)
7. } catch {
8. case NonFatal(e) =>
9. exitExecutor(1,"Unable to create executor due to " + e.getMessage, e)
10. }
上面代码中可以看到,CoarseGrainedExecutorBackend收到RegisteredExecutor消息后,将会新创建一个org.apache.spark.executor.Executor对象,至此Executor创建完毕。
- Executor到底是什么时候启动的?
- Driver到底是什么时候产生的
- 绝对定位的时候它的包含块到底是什么
- Activity到底是什么时候显示到屏幕上的呢?
- Activity到底是什么时候显示到屏幕上的呢
- Activity到底是什么时候显示到屏幕上的呢
- laravel在启动的时候到底做了什么
- 当push的时候应该注意的ODEX到底是什么文件
- 你要的到底是什么
- 爱情的感觉到底是什么?
- 外面的世界到底是什么
- 我们到底需要的是什么?
- HMODULE 到底定义的是什么
- 我到底是什么颜色的
- python的w+到底是什么
- 打败我们的到底是什么?
- 投资的本质到底是什么?
- bind()的作用到底是什么
- faster rcnn: 架构实现过程详细介绍
- javascript常用技巧(转载)
- String使用equals方法和==分别比较的是什么?
- Driver到底是什么时候产生的
- smarty实现静态页面练习
- Executor到底是什么时候启动的?
- 快速排序
- 【effective Java读书笔记】方法(一)
- 0/1背包问题
- hibernate自动创建表时提示语法错误“type=innoDB”
- 【JVM】类加载
- Spark中的python shell交互界面Ipython和jupyter notebook
- super、this(转载)
- 排序问题总结