Executor到底是什么时候启动的?

来源:互联网 发布:保定网络舆情日报 编辑:程序博客网 时间:2024/05/02 04:17

6.4.1 Executor到底是什么时候启动的?

在SparkContext启动之后,StandaloneSchedulerBackend中会new出一个StandaloneAppClient,StandaloneAppClient中有一个名叫ClientEndPoint的内部类,在创建ClientEndpoint会传入Command来指定具体为当前应用程序启动的Executor进行的入口类的名称为CoarseGrainedExecutorBackend ClientEndPoint继承自ThreadSafeRpcEndpoint,其通过RPC机制完成和Master的通信。在ClientEndPoint的start方法中,会通过registerWithMaster方法向Master发送RegisterApplication请求,Master收到该请求消息之后,首先通过registerApplication方法完成信息登记,之后将会调用schedule方法,在Worker上启动Executor,Master对RegisterApplication请求处理源代码如下所示。

Master.scala源码:

1.              caseRegisterApplication(description, driver) =>

2.             // TODO Prevent repeatedregistrations from some driver

3.          //Master处于STANDBY(备用)状态,不作处理

4.             if (state ==RecoveryState.STANDBY) {

5.               // ignore, don't sendresponse

6.             } else {

7.               logInfo("Registeringapp " + description.name)

8.           //由description描述,构建ApplicationInfo

9.               val app =createApplication(description, driver)

10.            registerApplication(app)

11.            logInfo("Registeredapp " + description.name + " with ID " + app.id)

12.         //在持久化引擎中加入application

13.            persistenceEngine.addApplication(app)

14.            driver.send(RegisteredApplication(app.id,self))

15.      //调用schedule方法,在worker节点上启动Executor   

16.            schedule()

17.          }

在上面的代码中,Master匹配到RegisterApplication请求,先判断Master的状态是否为STANDBY(备用)状态,如果不是说明Master为ALIVE状态,在这种状态下,调用createApplication(description,sender)方法创建ApplicationInfo,完成之后调用persistenceEngine.addApplication(app)方法,将新创建的ApplicationInfo持久化,以便错误恢复。完成这两步操作之后,通过driver.send(RegisteredApplication(app.id,self))向StandaloneAppClient返回注册成功后ApplicationInfo的Id和master的url地址。

ApplicationInfo对象是对application的描述,先来看一下createApplication这个方法的源代码,源代码如下所示。

Master.scala源码:

1.           private def createApplication(desc:ApplicationDescription, driver: RpcEndpointRef):

2.             ApplicationInfo = {

3.        //ApplicationInfo创建时间

4.           val now =System.currentTimeMillis()

5.           val date = new Date(now)

6.         //由date生成application id

7.           val appId =newApplicationId(date)

8.        //创建ApplicationInfo

9.           new ApplicationInfo(now,appId, desc, date, driver, defaultCores)

10.      }

上面代码中,createApplication方法接收ApplicationDescription和ActorRef两种类型的参数。并调用newAppicationId方法生成appId,关键代码如下所示。

1. val appId = "app-%s-%04d".format(createDateFormat.format(submitDate), nextAppNumber)

由代码所决定,appid的格式形如:app-20160429101010-0001。在desc这个对象中,包含一些基本的配置,包括从系统中传入的一些配置信息,如appname、maxCores、memoryPerExecutorMB等。最后使用desc、date、driver、defaultCores等做为参数构造一个ApplicatinInfo对象并返回。函数返回之后,调用registerApplication方法,完成application的注册,该方法是如何完成注册的呢?方法代码如下所示。

Master.scala源码:

1.         private def registerApplication(app:ApplicationInfo): Unit = {

2.         //Driver的地址,用于Master和Driver通信

3.           val appAddress =app.driver.address

4.         //如果addressToApp中已经有了该Driver地址,说明该Driver已经注册过了,直接return

5.        

6.           if(addressToApp.contains(appAddress)) {

7.             logInfo("Attempted tore-register application at same address: " + appAddress)

8.             return

9.           }

10.     //向度量系统注册

11.        applicationMetricsSystem.registerSource(app.appSource)

12.     //apps是一个HashSet,保存数据不能重复,向HashSet中加入app

13.        apps += app

14.    //idToApp是一个HashMap,该HashMap用于保存id和app的对应关系

15.        idToApp(app.id) = app

16.    //endpointToApp是一个HashMap, driver和app的对应关系

17.        endpointToApp(app.driver) =app

18.    //addressToApp是一个HashMap,记录app Driver的地址和app的对应关系

19.        addressToApp(appAddress) = app

20.    /waitingApps是一个数组,记录等待调度的app记录

21.        waitingApps += app

22.        if (reverseProxy) {

23.          webUi.addProxyTargets(app.id,app.desc.appUiUrl)

24.        }

25.      }

 

上面代码中,首先通过app.driver.path.address得到driver的地址,然后查看appAdress映射表中是否已经存在这个路径,如果存在表示该application已经注册,直接返回;如果不存在,则在waitingApps数组中加入该application,同时在idToApp、endpointToApp、addressToApp映射表中加入映射关系。加入waitingApps数组中的application等待schedule方法的调度。

schedule方法有两个作用,第一,完成Driver的调度,将waitingDrivers数组中的Driver发送到满足运行条件的Worker上运行。第二,在满足条件的Worker节点上为application启动Executor。schedule方法源代码如下所示。

Master.scala的schedule方法源码:

1.       private def schedule(): Unit = {

2.       …….

3.            launchDriver(worker, driver)

4.           …….

5.           startExecutorsOnWorkers()

6.         }

在Master中,schedule方法是一个很重要的方法,每一次新的Driver的注册application的注册或者可用资源发生变动,都将调用schedule方法。Schedule方法用于为当前等待调度的application调度可用的资源,在满足条件的Worker节点上启动Executor。这个方法还有另外一个作用,就是当有Driver提交的时候,负责将Driver发送到一个可用资源满足Driver需求的Worker节点上运行,launchDriver(worker,driver)方法负责完成这一任务。

application调度成功之后,Master将会为appication在Worker节点上启动Executors,调用startExecutorsOnWorkers方法完成此操作,其源代码如下所示。

Master.scala源码:

1.          privatedef startExecutorsOnWorkers(): Unit = {

2.           // Right now this is a verysimple FIFO scheduler. We keep trying to fit in the first app

3.           // in the queue, then thesecond app, etc.

4.           for (app <- waitingApps ifapp.coresLeft > 0) {

5.             val coresPerExecutor:Option[Int] = app.desc.coresPerExecutor

6.             // Filter out workers thatdon't have enough resources to launch an executor

7.             val usableWorkers =workers.toArray.filter(_.state == WorkerState.ALIVE)

8.               .filter(worker =>worker.memoryFree >= app.desc.memoryPerExecutorMB &&

9.                 worker.coresFree >=coresPerExecutor.getOrElse(1))

10.            .sortBy(_.coresFree).reverse

11.          val assignedCores =scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)

12.     

13.          // Now that we've decidedhow many cores to allocate on each worker, let's allocate them

14.          for (pos <- 0 untilusableWorkers.length if assignedCores(pos) > 0) {

15.            allocateWorkerResourceToExecutors(

16.              app, assignedCores(pos), coresPerExecutor,usableWorkers(pos))

17.          }

18.        }

19.      }

在scheduleExecutorsOnWorkers方法中,有两种启动Executor的策略,第一种是轮流均摊策略(round-robin),采用圆桌算法依次轮流均摊,直到满足资源需求,轮流均摊策略通常会有更好的数据本地性,因此它是默认的选择策略。第二种是依次全占,在usableWorkers中,依次获取每个Worker上的全部资源,直到满足资源需求。

scheduleExecutorsOnWorkers方法为application分配好逻辑意义上的资源后,还不能真正在Worker 节点为application 分配出资源,需要调用动作函数为application真正的分配资源,allocateWorkerResourceToExecutors 方法的调用,将会在Worker节点上实际分配资源,下面是allocateWorkerResourceToExecutors的源代码。

Master.scala源码:

1. private def allocateWorkerResourceToExecutors(

2.    ……

3.       launchExecutor(worker, exec)

4.    …….

上面代码调用了launchExecutor(worker,exec)方法,这个方法有两个参数,第一个参数是满足条件的WorkerInfo信息,第二个参数是描述Executor的ExecutorDesc对象。这个方法将会向Worker节点发送LaunchExecutor的请求,Worker节点收到该请求之后,将会负者启动Executor。launchExecutor方法代码清单如下所示。

Master.scala源码:

1.           private def launchExecutor(worker: WorkerInfo,exec: ExecutorDesc): Unit = {

2.           logInfo("Launchingexecutor " + exec.fullId + " on worker " + worker.id)

3.       //向WorkerInfo中加入exec这个描述Executor的ExecutorDesc对象

4.           worker.addExecutor(exec)

5.       //向worker发送LaunchExecutor消息,加载Executor消息中携带了masterUrl地址、application id、Executor id、Executor描述desc、Executor核的个数、Executor分配的内存大小

6.        

7.           worker.endpoint.send(LaunchExecutor(masterUrl,

8.             exec.application.id,exec.id, exec.application.desc, exec.cores, exec.memory))

9.         //向Driver发回ExecutorAdded消息,消息携带worker的id号,worker的host和port,分配的核的个数和内存大小

10.        exec.application.driver.send(

11.          ExecutorAdded(exec.id,worker.id, worker.hostPort, exec.cores, exec.memory))

12.      }      

 launchExecutor有两个参数,第一个参数是worker:WorkerInfo,代表着Worker的基本信息,第二个参数是exec:ExecutorDesc,这个参数保存了Executor的基本配置信息,如memory、cores等。此方法中,有worker.endpoint.send(LaunchExecutor(...)),向Worker发送LaunchExecutor请求,Worker收到该请求之后将会调用方法启动Executor。

向Worker发送LaunchExecutor消息的同时,通过exec.application.driver.send(ExecutorAdded(…))向Driver发送ExecutorAdded消息,该消息为Driver反馈Master都在哪些Worker上启动了Executor,Executor的编号是多少,为每个Executor分配了多少个核,多大的内存以及Worker的联系hostport等消息。

Worker收到LaunchExecutor消息会做相应的处理,在Worker节点中,LaunchExecutor处理逻辑源代码如下所示。

Worker.scala源码:

1.           caseLaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>

2.        //若masterUrl和activeMasterUrl不是同一个url,说明非法的Master尝试加载Executor,打印错误信息

3.             if (masterUrl != activeMasterUrl) {

4.               logWarning("InvalidMaster (" + masterUrl + ") attempted to launch executor.")

5.             } else {

6.               try {

7.                 logInfo("Asked tolaunch executor %s/%d for %s".format(appId, execId, appDesc.name))

8.        

9.               //在workDir/appId/目录下创建以execId为名的Executor工作目录

10.              val executorDir = newFile(workDir, appId + "/" + execId)

11.        //调用mkdirs创建目录

12.              if(!executorDir.mkdirs()) {

13.                throw newIOException("Failed to create directory " + executorDir)

14.              }

15.     

16.             //为Executor创建本地目录,该目录通过变量SPARK_EXECUTOR_DIRS设置并传递,该目录在application运行结束时由Worker负责删除

17.              val appLocalDirs =appDirectories.getOrElse(appId,

18.                Utils.getOrCreateLocalRootDirs(conf).map{ dir =>

19.                  val appDir =Utils.createDirectory(dir, namePrefix = "executor")

20.                  Utils.chmod700(appDir)

21.                  appDir.getAbsolutePath()

22.                }.toSeq)

23.              //在哈希表appDirectories中加入appId和appLocalDirs的对应关系

24.              appDirectories(appId) =appLocalDirs

25.      //创建ExecutorRunner

26.              val manager = newExecutorRunner(

27.                appId,

28.                execId,

29.                appDesc.copy(command =Worker.maybeUpdateSSLSettings(appDesc.command, conf)),

30.                cores_,

31.                memory_,

32.                self,

33.                workerId,

34.                host,

35.                webUi.boundPort,

36.                publicAddress,

37.                sparkHome,

38.                executorDir,

39.                workerUri,

40.                conf,

41.                appLocalDirs,ExecutorState.RUNNING)

42.      //在哈希表executors中加入appId+”/”+execId和ExecutorRunner的对应关系

43.              executors(appId + "/" + execId)= manager

44.        //启动ExecutorRunner

45.              manager.start()

46.              coresUsed += cores_

47.       //Worker上已经使用的核增加cores_个,cores_分配给Executor的核的个数

48.              memoryUsed += memory_

49.      //向Master发送ExecutorStateChanged消息,该消息携带appId,exeId,ExecutorRunner的状态

50.              sendToMaster(ExecutorStateChanged(appId,execId, manager.state, None, None))

51.            } catch {

52.              case e: Exception =>

53.                logError(s"Failedto launch executor $appId/$execId for ${appDesc.name}.", e)

54.                if (executors.contains(appId + "/"+ execId)) {

55.                  executors(appId +"/" + execId).kill()

56.                  executors -= appId +"/" + execId

57.                }

58.                sendToMaster(ExecutorStateChanged(appId,execId, ExecutorState.FAILED,

59.                  Some(e.toString), None))

60.            }

61.          }

  

上面代码中,首先判断传过来的masterUrl是否和activeMasterUrl相同,如果不相同,说明收到的不是处于ALIVE状态的Master发送过来的请求,这种情况直接打印警告信息。如果相同,则说明该请求来自ALIVE Master,于是为Executor创建工作目录,创建好工作目录之后,使用appid、execid、appDes等参数创建ExecutorRunner,顾名思义,ExecutorRunner是Executor运行的地方,在ExecutorRunner中,有一个工作线程,这个线程负责下载依赖的文件,并启动CoarseGaindExecutorBackend进程,该进程单独在一个JVM上运行。下面是ExecutorRunner中的线程启动的源代码。

ExecutorRunner.scala源码:

1.        private[worker] def start() {

2.            //创建线程

3.           workerThread = newThread("ExecutorRunner for " + fullId) {

4.         //线程run方法中调用fetchAndRunExcutor

5.             override def run() {fetchAndRunExecutor() }

6.           }

7.          //启动线程

8.           workerThread.start()

9.           

10.       // 终止回调函数,用于杀死进程

11.        shutdownHook =ShutdownHookManager.addShutdownHook { () =>

12.          // It's possible that wearrive here before calling `fetchAndRunExecutor`, then `state` will

13.          // be `ExecutorState.RUNNING`.In this case, we should set `state` to `FAILED`.

14.          if (state ==ExecutorState.RUNNING) {

15.            state =ExecutorState.FAILED

16.          }

17.          killProcess(Some("Worker shuttingdown")) }

18.      }

上面代码中,定义了一个Thread,这个Thread的run方法中调用fetchAndRunExecutor方法,fetchAndRunExecutor负责以进程的方式启动ApplicationDescription中携带的org.apache.spark.executor.CoarseGrainedExecutorBackend进程。fetchAndRunExecutor方法源代码如下所示。

ExecutorRunner.scala源码:

1.         private def fetchAndRunExecutor() {

2.           try {

3.             // Launch the process

4.             val builder =CommandUtils.buildProcessBuilder(appDesc.command, new SecurityManager(conf),

5.               memory,sparkHome.getAbsolutePath, substituteVariables)

6.             val command =builder.command()

7.             val formattedCommand =command.asScala.mkString("\"", "\" \"","\"")

8.             logInfo(s"Launchcommand: $formattedCommand")

9.        

10.          builder.directory(executorDir)

11.          builder.environment.put("SPARK_EXECUTOR_DIRS",appLocalDirs.mkString(File.pathSeparator))

12.          // In case we are runningthis from within the Spark Shell, avoid creating a "scala"

13.          // parent process for theexecutor command

14.          builder.environment.put("SPARK_LAUNCH_WITH_SCALA","0")

15.     

16.          // Add webUI log urls

17.          val baseUrl =

18.            if(conf.getBoolean("spark.ui.reverseProxy", false)) {

19.              s"/proxy/$workerId/logPage/?appId=$appId&executorId=$execId&logType="

20.            } else {

21.              s"http://$publicAddress:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType="

22.            }

23.          builder.environment.put("SPARK_LOG_URL_STDERR",s"${baseUrl}stderr")

24.          builder.environment.put("SPARK_LOG_URL_STDOUT",s"${baseUrl}stdout")

25.     

26.          process = builder.start()

27.          val header = "SparkExecutor Command: %s\n%s\n\n".format(

28.            formattedCommand, "=" * 40)

29.     

30.          // Redirect its stdout andstderr to files

31.          val stdout = newFile(executorDir, "stdout")

32.          stdoutAppender =FileAppender(process.getInputStream, stdout, conf)

33.     

34.          val stderr = newFile(executorDir, "stderr")

35.          Files.write(header, stderr,StandardCharsets.UTF_8)

36.          stderrAppender =FileAppender(process.getErrorStream, stderr, conf)

37.     

38.          // Wait for it to exit;executor may exit with code 0 (when driver instructs it to shutdown)

39.          // or with nonzero exit code

40.          val exitCode =process.waitFor()

41.          state = ExecutorState.EXITED

42.          val message = "Commandexited with code " + exitCode

43.          worker.send(ExecutorStateChanged(appId,execId, state, Some(message), Some(exitCode)))

44.        } catch {

45.          case interrupted:InterruptedException =>

46.            logInfo("Runnerthread for executor " + fullId + " interrupted")

47.            state =ExecutorState.KILLED

48.            killProcess(None)

49.          case e: Exception =>

50.            logError("Errorrunning executor", e)

51.            state =ExecutorState.FAILED

52.            killProcess(Some(e.toString))

53.        }

54.      }

其中fetchAndRunExecutor()方法中的CommandUtils.buildProcessBuilder(appDesc.command,传入的入口类是:"org.apache.spark.executor.CoarseGrainedExecutorBackend",当Worker节点中启动ExecutorRunner时,ExecutorRunner中会启动CoarseGrainedExecutorBackend进程,在CoarseGrainedExecutorBackend的onStart方法中,向Driver发出RegisterExecutor注册请求。

CoarseGrainedExecutorBackend的onStart方法源码:

1.             override def onStart() {

2.         …….

3.               driver = Some(ref)

4.          //向driver发送ask请求,等待driver的回应

5.               ref.ask[Boolean](RegisterExecutor(executorId,self, hostname, cores, extractLogUrls))

6.           ……

Driver端收到注册请求,将会注册Executor的请求,

CoarseGrainedSchedulerBackend.scala的 receiveAndReply方法源码:

1.             override def receiveAndReply(context:RpcCallContext): PartialFunction[Any, Unit] = {

2.          

3.               caseRegisterExecutor(executorId, executorRef, hostname, cores, logUrls) =>

4.          …….

5.                   executorRef.send(RegisteredExecutor)

6.           ……

 如上面代码所示,Driver向CoarseGrainedExecutorBackend发送RegisteredExecutor消息后,CoarseGrainedExecutorBackend收到RegisteredExecutor消息后将会新建一个Executor执行器,并为此Executor充当信使,与Driver通信。CoarseGrainedExecutorBackend收到RegisteredExecutor消息源代码如下所示。

CoarseGrainedExecutorBackend.scala的receive源码:

1.          override def receive:PartialFunction[Any, Unit] = {

2.             case RegisteredExecutor =>

3.               logInfo("Successfullyregistered with driver")

4.               try {

5.           //收到RegisteredExecutor消息,立即创建Executor

6.                 executor = newExecutor(executorId, hostname, env, userClassPath, isLocal = false)

7.               } catch {

8.                 case NonFatal(e) =>

9.                   exitExecutor(1,"Unable to create executor due to " + e.getMessage, e)

10.            }

上面代码中可以看到,CoarseGrainedExecutorBackend收到RegisteredExecutor消息后,将会新创建一个org.apache.spark.executor.Executor对象,至此Executor创建完毕。

 

原创粉丝点击