第35课: 打通Spark系统运行内幕机制循环流程

来源:互联网 发布:模具加工中心编程 编辑:程序博客网 时间:2024/05/22 11:37

第35课: 打通Spark系统运行内幕机制循环流程



Spark通过DAGScheduler面向整个Job划分出了不同的Stage,划分Stage之后,Stage从后往前划分,执行的时候从前往后执行,每个Stage内部有一系列的任务,Stage里面的任务是并行计算,并行任务的逻辑是完全相同的,但处理的数据不同。DAGScheduler以TaskSet的方式,把我们一个DAG构建的Stage中的所有任务提交给底层的调度器TaskScheduler。TaskScheduler是一个接口,跟具体的任务解耦合,可以运行在不同的调度模式下,如可运行在Standalone模式,也运行在Yarn上。

Spark的基础调度图包括RDD、DAGScheduler、TaskScheduler、Worker等内容,本节讲解TaskScheduler工作原理。


图 8- 3 Spark 运行原理图

DAGScheduler在提交TaskSet给底层调度器的时候是面向接口TaskScheduler的,这符合面向对象中依赖抽象而不依赖具体的原则,带来底层资源调度器的可插拔性,导致Spark可以运行的众多的资源调度器模式上,例如Standalone、Yarn、Mesos、Local、EC2、其它自定义的资源调度器;在Standalone的模式下我们聚焦于TaskSchedulerImpl。

TaskScheduler是一个接口Trait,底层任务调度接口,由[[org.apache.spark.scheduler.TaskSchedulerImpl]]实现。这个接口允许插入不同的任务调度程序。每个任务调度器在单独的SparkContext中调度任务。任务调度程序从每个Stage的DAGScheduler获得提交的任务集,负责发送任务到集群运行,如果任务运行失败,将重试,将返回DAGScheduler事件。

TaskScheduler代码如下:

1.            private[spark] trait TaskScheduler {

2.          

3.           private val appId ="spark-application-" + System.currentTimeMillis

4.          

5.           def rootPool: Pool

6.          

7.           def schedulingMode: SchedulingMode

8.          

9.           def start(): Unit

10.       

11.        // Invoked after system has successfullyinitialized (typically in spark context).

12.        // Yarn uses this to bootstrap allocation ofresources based on preferred locations,

13.        // wait for slave registrations, etc.

14.        def postStartHook() { }

15.       

16.        // Disconnect from the cluster.

17.        def stop(): Unit

18.       

19.        // Submit a sequence of tasks to run.

20.        def submitTasks(taskSet: TaskSet): Unit

21.       

22.        // Cancel a stage.

23.        def cancelTasks(stageId: Int,interruptThread: Boolean): Unit

24.       

25.        // Set the DAG scheduler for upcalls. This isguaranteed to be set before submitTasks is called.

26.        def setDAGScheduler(dagScheduler:DAGScheduler): Unit

27.       

28.        // Get the default level of parallelism touse in the cluster, as a hint for sizing jobs.

29.        def defaultParallelism(): Int

30.       

31.        /**

32.         * Update metrics for in-progress tasks andlet the master know that the BlockManager is still

33.         * alive. Return true if the driver knowsabout the given block manager. Otherwise, return false,

34.         * indicating that the block manager shouldre-register.

35.         */

36.        def executorHeartbeatReceived(

37.            execId: String,

38.            accumUpdates: Array[(Long, Seq[AccumulatorV2[_,_]])],

39.            blockManagerId: BlockManagerId): Boolean

40.       

41.        /**

42.         * Get an application ID associated with thejob.

43.         *

44.         * @return An application ID

45.         */

46.        def applicationId(): String = appId

47.       

48.        /**

49.         * Process a lost executor

50.         */

51.        def executorLost(executorId: String, reason:ExecutorLossReason): Unit

52.       

53.        /**

54.         * Get an application's attempt ID associatedwith the job.

55.         *

56.         * @return An application's Attempt ID

57.         */

58.        def applicationAttemptId(): Option[String]

59.       

60.      }

 

        DAGScheduler把TaskSet交给底层的接口TaskScheduler,具体实现的时候有不同的实现,TaskScheduler主要由TaskSchedulerImpl实现:

1.           private[spark]class TaskSchedulerImpl(

2.             val sc: SparkContext,

3.             val maxTaskFailures: Int,

4.             isLocal: Boolean = false)

5.           extends TaskScheduler with Logging

6.         {

TaskSchedulerImpl也有自己的子类YarnScheduler。

1.            private[spark] class YarnScheduler(sc:SparkContext) extends TaskSchedulerImpl(sc) {

2.          

3.           // RackResolver logs an INFO message wheneverit resolves a rack, which is way too often.

4.           if(Logger.getLogger(classOf[RackResolver]).getLevel == null) {

5.             Logger.getLogger(classOf[RackResolver]).setLevel(Level.WARN)

6.           }

7.          

8.           // By default, rack is unknown

9.           override def getRackForHost(hostPort:String): Option[String] = {

10.          val host = Utils.parseHostPort(hostPort)._1

11.          Option(RackResolver.resolve(sc.hadoopConfiguration,host).getNetworkLocation)

12.        }

13.      }

YarnScheduler的子类YarnClusterScheduler实现如下:

1.           private[spark]class YarnClusterScheduler(sc: SparkContext) extends YarnScheduler(sc) {

2.           logInfo("CreatedYarnClusterScheduler")

3.          

4.           override def postStartHook() {

5.             ApplicationMaster.sparkContextInitialized(sc)

6.             super.postStartHook()

7.             logInfo("YarnClusterScheduler.postStartHookdone")

8.           }

9.          

10.      }

 

 

默认情况下我们研究Standalone的模式,所以主要研究TaskSchedulerImpl。DAGScheduler把TaskSet交给TaskScheduler, TaskScheduler中通过TastSetManager管理具体的任务。TaskScheduler的核心任务是提交TaskSet到集群运算并汇报结果:

l  为TaskSet创建和维护一个TaskSetManager并追踪任务的本地性以及错误信息;

l  遇到延后的Straggle任务会放到其它的节点进行重试;

l  向DAGScheduler汇报执行情况,包括在Shuffle输出lost的时候报告fetch failed错误等信息;

TaskSet是一个普通的类,第一个成员是tasks,tasks是一个数组。TaskSet源码如下:

1.            private[spark]class TaskSet(

2.             val tasks: Array[Task[_]],

3.             val stageId: Int,

4.             val stageAttemptId: Int,

5.             val priority: Int,

6.             val properties: Properties) {

7.           val id: String = stageId + "." +stageAttemptId

8.          

9.           override def toString: String = "TaskSet" + id

10.      }

 

TaskScheduler内部有SchedulerBackend,SchedulerBackend管理Executor资源。从Standalone的模式来讲具体实现是StandaloneSchedulerBackend(Spark 2.0 版本将之前的SparkDeploySchedulerBackend名字更新为StandaloneSchedulerBackend)。

SchedulerBackend本身是一个接口,是一个trait。SchedulerBackend源码如下:

1.          private[spark] trait SchedulerBackend {

2.           private val appId ="spark-application-" + System.currentTimeMillis

3.          

4.           def start(): Unit

5.           def stop(): Unit

6.           def reviveOffers(): Unit

7.           def defaultParallelism(): Int

8.          

9.           def killTask(taskId: Long, executorId:String, interruptThread: Boolean): Unit =

10.          throw new UnsupportedOperationException

11.        def isReady(): Boolean = true

12.       

13.        /**

14.         * Get an application ID associated with thejob.

15.         *

16.         * @return An application ID

17.         */

18.        def applicationId(): String = appId

19.       

20.        /**

21.         * Get the attempt ID for this run, if thecluster manager supports multiple

22.         * attempts. Applications run in client modewill not have attempt IDs.

23.         *

24.         * @return The application attempt id, ifavailable.

25.         */

26.        def applicationAttemptId(): Option[String] =None

27.       

28.        /**

29.         * Get the URLs for the driver logs. TheseURLs are used to display the links in the UI

30.         * Executors tab for the driver.

31.         * @return Map containing the log names andtheir respective URLs

32.         */

33.        def getDriverLogUrls: Option[Map[String,String]] = None

34.       

35.      }

 

 

StandaloneSchedulerBackend:专门负责收集Worker的资源信息。接收Worker向Driver注册的信息,ExecutorBackend启动的时候进行注册,为当前应用程序准备计算资源,以进程为单位。

StandaloneSchedulerBackend:

1.          private[spark] classStandaloneSchedulerBackend(

2.             scheduler: TaskSchedulerImpl,

3.             sc: SparkContext,

4.             masters: Array[String])

5.           extendsCoarseGrainedSchedulerBackend(scheduler, sc.env.rpcEnv)

6.           with StandaloneAppClientListener

7.           with Logging {

8.           private var client: StandaloneAppClient =null

9.         ……

 

StandaloneSchedulerBackend里面有个client: StandaloneAppClient

1.           private[spark]class StandaloneAppClient(

2.             rpcEnv: RpcEnv,

3.             masterUrls: Array[String],

4.             appDescription: ApplicationDescription,

5.             listener: StandaloneAppClientListener,

6.             conf:SparkConf)

7.           extends Logging {

 

StandaloneAppClient 允许应用程序与 Spark standalone 集群管理器通信。获取Master的URL、应用程序描述和集群事件监听器, 当各种事件发生时可以回调监听器。masterUrls的格式为spark://host:port,StandaloneAppClient需要向Master进行注册。

StandaloneAppClient在StandaloneSchedulerBackend.scala的start方法启动的时候进行赋值,new出来一个StandaloneAppClient。

1.           private[spark]class StandaloneSchedulerBackend(

2.         ......

3.          

4.         override def start() {

5.         ......

6.          val appDesc = newApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,

7.               appUIAddress, sc.eventLogDir,sc.eventLogCodec, coresPerExecutor, initialExecutorLimit)

8.             client = newStandaloneAppClient(sc.env.rpcEnv, masters, appDesc, this, conf)

9.             client.start()

10.          launcherBackend.setState(SparkAppHandle.State.SUBMITTED)

11.          waitForRegistration()

12.          launcherBackend.setState(SparkAppHandle.State.RUNNING)

13.        }

 

StandaloneAppClient.scala中,里面有一个类是ClientEndpoint,核心工作是在启动的时候向Master注册。StandaloneAppClient的start方法启动的时候,就new出来一个ClientEndpoint。

StandaloneAppClient源码如下:

1.             private[spark]class StandaloneAppClient(

2.         ......

3.          private class ClientEndpoint(override valrpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint

4.             with Logging {

5.         ……

6.         def start() {

7.             // Just launch an rpcEndpoint;it will call back into the listener.

8.             endpoint.set(rpcEnv.setupEndpoint("AppClient",new ClientEndpoint(rpcEnv)))

9.           }

 

StandaloneSchedulerBackend在启动的时候构建StandaloneAppClient实例,并在StandaloneAppClient实例start的时候启动了ClientEndpoint这个消息循环体,ClientEndpoint在启动的时候会向Master注册当前程序。

StandaloneAppClient中ClientEndpoint类的onStart()方法:

1.             override def onStart(): Unit = {

2.               try {

3.                 registerWithMaster(1)

4.               } catch {

5.                 case e: Exception =>

6.                   logWarning("Failedto connect to master", e)

7.                   markDisconnected()

8.                   stop()

9.               }

10.          }

 

这个是StandaloneSchedulerBackend的第一个注册的核心功能。StandaloneSchedulerBackend 继承至CoarseGrainedSchedulerBackend。而CoarseGrainedSchedulerBackend在启动的时候就创建DriverEndpoint,从实例的角度讲,DriverEndpoint也是属于StandaloneSchedulerBackend实例。:

1.            private[spark]

2.         class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, valrpcEnv: RpcEnv)

3.           extends ExecutorAllocationClientwith SchedulerBackend with Logging

4.         {

5.         ......

6.          class DriverEndpoint(override valrpcEnv: RpcEnv, sparkProperties: Seq[(String, String)])

7.             extends ThreadSafeRpcEndpointwith Logging {

8.         ......

StandaloneSchedulerBackend的父类CoarseGrainedSchedulerBackend在start的时候会实例化类型为DriverEndpoint(这就是我们程序运行时候的经典对象 Driver)的消息循环体。

StandaloneSchedulerBackend 在运行的时候向Master注册申请到资源,当Worker的ExecutorBackend启动的时候会发送RegisteredExecutor信息向DriverEndpoint注册,此时StandaloneSchedulerBackend就掌握了当前应用程序拥有的计算资源,TaskScheduler就是通过StandaloneSchedulerBackend拥有的计算资源来具体运行Task;StandaloneSchedulerBackend不是应用程序的总管,应用程序的总管是DAGScheduler、TaskScheduler,StandaloneSchedulerBackend将应用程序的Task获取具体的计算资源,并把Task发送到集群中。

SparkContext、DAGScheduler、TaskSchedulerImpl、StandaloneSchedulerBackend 在应用程序启动的时候只实例化一次,应用程序存在期间始终存在这些对象;

这里基于spark 2.1 版本讲解:

Spark调度器三大核心资源:SparkContext、DAGScheduler、TaskSchedulerImpl,TaskSchedulerImpl作为具体的底层调度器,运行的时候需要计算资源,因此需要StandaloneSchedulerBackend,StandaloneSchedulerBackend设计巧妙的地方是启动的时候启动StandaloneAppClient,而StandaloneAppClient在start的时候有一个ClientEndpoint的消息循环体,ClientEndpoint的消息循环体启动的时候向Master注册应用程序;

StandaloneSchedulerBackend 的父类CoarseGrainedSchedulerBackend在start启动的时候会实例化DriverEndpoint,所有的ExecutorBackend启动的时候都要向DriverEndpoint注册,注册最后落到了StandaloneSchedulerBackend 的内存数据结构中,表明上看是在CoarseGrainedSchedulerBackend,但是实例化的时候是StandaloneSchedulerBackend,注册给父类的成员其实就是子类的成员。

 

作为前提问题:TaskScheduler的启动、StandaloneSchedulerBackend的启动是如何启动的?TaskSchedulerImpl什么时候实例化的?

TaskSchedulerImpl是在SparkContext中实例化的。在SparkContext类实例化的时候,类实例化的时候只要不是方法体里面的内容都会被执行,(sched, ts) 是SparkContext的成员,将调用createTaskScheduler方法,调用createTaskScheduler方法返回一个Tuple包括2个元素:sched是我们的schedulerBackend,ts是taskScheduler。

1.          class SparkContext(config:SparkConf) extends Logging {

2.           ......

3.          // Create and start the scheduler

4.             val (sched, ts) =SparkContext.createTaskScheduler(this, master, deployMode)

5.             _schedulerBackend = sched

6.             _taskScheduler = ts

7.             _dagScheduler = newDAGScheduler(this)

8.             _heartbeatReceiver.ask[Boolean](TaskSchedulerIsSet)

 

createTaskScheduler里面有很多运行模式,我们这里关注Standalone模式,这里首先new出来一个TaskSchedulerImpl,TaskSchedulerImpl和SparkContext是一一对应的,整个程序运行的时候只有一个TaskSchedulerImpl,也只有一个SparkContext;接着实例化StandaloneSchedulerBackend,整个程序运行的时候只有一个StandaloneSchedulerBackend。createTaskScheduler方法如下:

1.             privatedef createTaskScheduler(

2.               sc: SparkContext,

3.               master: String,

4.               deployMode: String):(SchedulerBackend, TaskScheduler) = {

5.             import SparkMasterRegex._

6.         ......

7.           master match {

8.         ......

9.               case SPARK_REGEX(sparkUrl)=>

10.              val scheduler = newTaskSchedulerImpl(sc)

11.              val masterUrls =sparkUrl.split(",").map("spark://" + _)

12.              val backend = newStandaloneSchedulerBackend(scheduler, sc, masterUrls)

13.              scheduler.initialize(backend)

14.              (backend, scheduler)

15.      ......

 

在SparkContext实例化的时候通过createTaskScheduler来创建TaskSchedulerImpl和StandaloneSchedulerBackend。在createTaskScheduler中然后调用scheduler.initialize(backend)

initialize的方法参数把StandaloneSchedulerBackend传进来,schedulingMode模式匹配2种方式:FIFO、FAIR。

TaskSchedulerImpl的initialize源码如下:

1.               definitialize(backend: SchedulerBackend) {

2.             this.backend = backend

3.             // temporarily set rootPoolname to empty

4.             rootPool = newPool("", schedulingMode, 0, 0)

5.             schedulableBuilder = {

6.               schedulingMode match {

7.                 case SchedulingMode.FIFO=>

8.                   newFIFOSchedulableBuilder(rootPool)

9.                 case SchedulingMode.FAIR=>

10.                newFairSchedulableBuilder(rootPool, conf)

11.              case _ =>

12.                throw newIllegalArgumentException(s"Unsupported spark.scheduler.mode:$schedulingMode")

13.            }

14.          }

15.          schedulableBuilder.buildPools()

16.        }

 

initialize的方法中调用schedulableBuilder.buildPools(),buildPools方法根据FIFOSchedulableBuilder、FairSchedulableBuilder不同的模式重载方法实现:

1.          private[spark] traitSchedulableBuilder {

2.           def rootPool: Pool

3.          

4.           def buildPools(): Unit

5.          

6.           def addTaskSetManager(manager:Schedulable, properties: Properties): Unit

7.         }

 

initialize的方法把StandaloneSchedulerBackend传进来了,但还没有启动StandaloneSchedulerBackend。在TaskSchedulerImpl的initialize方法中把StandaloneSchedulerBackend传进来从而赋值为TaskSchedulerImpl的backend;在TaskSchedulerImpl调用start方法的时候会调用backend.start方法,在start方法中会最终注册应用程序。

     看一下SparkContext.scala的taskScheduler的启动:

1.           val (sched, ts) =SparkContext.createTaskScheduler(this, master, deployMode)

2.             _schedulerBackend = sched

3.             _taskScheduler = ts

4.             _dagScheduler = newDAGScheduler(this)

5.         ……

6.             _taskScheduler.start()

7.             _applicationId =_taskScheduler.applicationId()

8.             _applicationAttemptId =taskScheduler.applicationAttemptId()

9.             _conf.set("spark.app.id",_applicationId)

10.      ……

 

其中调用了_taskScheduler的start方法:

1.          private[spark] traitTaskScheduler {

2.         ......

3.          

4.           def start(): Unit

5.         …..

TaskScheduler的start()方法没具体实现,TaskScheduler子类的TaskSchedulerImpl的start()方法源码如下:

1.            override def start() {

2.             backend.start()

3.          

4.             if (!isLocal &&conf.getBoolean("spark.speculation", false)) {

5.               logInfo("Starting speculativeexecution thread")

6.               speculationScheduler.scheduleAtFixedRate(newRunnable {

7.                 override def run(): Unit =Utils.tryOrStopSparkContext(sc) {

8.                   checkSpeculatableTasks()

9.                 }

10.            }, SPECULATION_INTERVAL_MS,SPECULATION_INTERVAL_MS, TimeUnit.MILLISECONDS)

11.          }

12.        }

 

TaskSchedulerImpl的start()这里就通过 backend.start()启动了StandaloneSchedulerBackend的start方法:

1.           override def start() {

2.             super.start()

3.             launcherBackend.connect()

4.           ......

5.           val command =Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",

6.               args, sc.executorEnvs,classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)

7.           .......

8.             val appDesc = newApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,

9.               appUIAddress,sc.eventLogDir, sc.eventLogCodec, coresPerExecutor, initialExecutorLimit)

10.          client = newStandaloneAppClient(sc.env.rpcEnv, masters, appDesc, this, conf)

11.          client.start()

12.        ........

13.        }

 

StandaloneSchedulerBackend的start方法中,将command封装注册给Master,Master转过来要Worker启动具体的Executor。command已经封装好指令,Executor具体要启动进程入口类CoarseGrainedExecutorBackend。然后new出来一个StandaloneAppClient,通过client.start()启动client。

StandaloneAppClient的start方法中new出来一个ClientEndpoint:

1.            def start() {

2.             // Just launch an rpcEndpoint;it will call back into the listener.

3.             endpoint.set(rpcEnv.setupEndpoint("AppClient",new ClientEndpoint(rpcEnv)))

4.           }

ClientEndpoint源码如下:

1.             private classClientEndpoint(override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint

2.             with Logging {

3.         ……

4.             override def onStart(): Unit = {

5.               try {

6.                 registerWithMaster(1)

7.               } catch {

8.                 case e: Exception =>

9.                   logWarning("Failed to connect tomaster", e)

10.                markDisconnected()

11.                stop()

12.            }

13.          }

 

ClientEndpoint是一个ThreadSafeRpcEndpoint, ClientEndpoint的onStart()方法中调用registerWithMaster(1)进行注册,向Master注册程序,registerWithMaster方法如下:

1.               private def registerWithMaster(nthRetry:Int) {

2.               registerMasterFutures.set(tryRegisterAllMasters())

3.               registrationRetryTimer.set(registrationRetryThread.schedule(newRunnable {

4.                 override def run(): Unit = {

5.                   if (registered.get) {

6.                     registerMasterFutures.get.foreach(_.cancel(true))

7.                     registerMasterThreadPool.shutdownNow()

8.                   } else if (nthRetry >=REGISTRATION_RETRIES) {

9.                     markDead("All masters areunresponsive! Giving up.")

10.                } else {

11.                  registerMasterFutures.get.foreach(_.cancel(true))

12.                  registerWithMaster(nthRetry + 1)

13.                }

14.              }

15.            }, REGISTRATION_TIMEOUT_SECONDS,TimeUnit.SECONDS))

16.          }

程序注册以后,Master通过 schedule()为我们分配资源,通知Worker启动Executor,Executor启动的进程是CoarseGrainedExecutorBackend,Executor启动以后又转过来向Driver注册,Driver其实是StandaloneSchedulerBackend的父类CoarseGrainedSchedulerBackend的一个消息循环体DriverEndpoint。

         大总结:

在SparkContext实例化的时候调用createTaskScheduler来创建TaskSchedulerImpl和StandaloneSchedulerBackend,同时在SparkContext实例化的时候会调用TaskSchedulerImpl的start,在start方法中会调用StandaloneSchedulerBackend的start,在该start方法中会创建StandaloneAppClient对象并调用StandaloneAppClient对象的start方法,在该start方法中会创建ClientEndpoint,在创建ClientEndpoint会传入Command来指定具体为当前应用程序启动的Executor进行的入口类的名称为CoarseGrainedExecutorBackend,然后ClientEndpoint启动并通过tryRegisterMaster来注册当前的应用程序到Master中,Master接受到注册信息后如果可以运行程序,则会为该程序生产Job ID并通过schedule来分配计算资源,具体计算资源的分配是通过应用程序的运行方式、Memory、cores等配置信息来决定的,最后Master会发送指令给Worker,Worker中为当前应用程序分配计算资源时会首先分配ExecutorRunner,ExecutorRunner内部会通过Thread的方式构建ProcessBuilder来启动另外一个JVM进程,这个JVM进程启动时候加载的main方法所在的类的名称就是在创建ClientEndpoint时传入的Command来指定具体名称为CoarseGrainedExecutorBackend的类,此时JVM在通过ProcessBuilder启动的时候获得了CoarseGrainedExecutorBackend后加载并调用其中的main方法,在main方法中会实例化CoarseGrainedExecutorBackend本身这个消息循环体,而CoarseGrainedExecutorBackend在实例化的时候会通过回调onStart向DriverEndpoint发送RegisterExecutor来注册当前的CoarseGrainedExecutorBackend,此时DriverEndpoint收到到该注册信息并保存在了StandaloneSchedulerBackend实例的内存数据结构中,这样Driver就获得了计算资源!

CoarseGrainedExecutorBackend.scala的main方法:

1.          def main(args: Array[String]) {

2.             var driverUrl: String = null

3.             var executorId: String = null

4.             var hostname: String = null

5.             var cores: Int = 0

6.             var appId: String = null

7.             var workerUrl: Option[String] = None

8.             val userClassPath = new mutable.ListBuffer[URL]()

9.          

10.          var argv = args.toList

11.         ......

12.          run(driverUrl, executorId, hostname, cores,appId, workerUrl, userClassPath)

13.          System.exit(0)

14.        }

CoarseGrainedExecutorBackend 的main然后开始调用run方法:

1.            privatedef run(

2.               driverUrl: String,

3.               executorId: String,

4.               hostname: String,

5.               cores: Int,

6.               appId: String,

7.               workerUrl: Option[String],

8.               userClassPath: Seq[URL]) {

9.         ......

10.             env.rpcEnv.setupEndpoint("Executor",new CoarseGrainedExecutorBackend(

11.              env.rpcEnv, driverUrl, executorId,hostname, cores, userClassPath, env))

12.      ......

13.            

 

在CoarseGrainedExecutorBackend的main方法中,通过env.rpcEnv.setupEndpoint("Executor",new CoarseGrainedExecutorBackend构建了CoarseGrainedExecutorBackend实例本身。

 


阅读全文
0 0
原创粉丝点击