第33课: Spark Executor内幕彻底解密:Executor工作原理图、ExecutorBackend注册源码解密、Executor实例化内幕、Executor具体工作内幕
来源:互联网 发布:linux怎么启动apache 编辑:程序博客网 时间:2024/05/21 07:11
第33课: Spark Executor内幕彻底解密:Executor工作原理图、ExecutorBackend注册源码解密、Executor实例化内幕、Executor具体工作内幕
本节讲解Executor工作原理图、ExecutorBackend注册源码解密、Executor实例化内幕、Executor具体工作内幕 。
Master让Worker启动,启动了一个Executor所在的进程,在Standalone模式中,Executor所在的进程是CoarseGrainedExecutorBackend。
l Master侧:Master发指令给Worker启动Executor。
l Worker侧:Worker接收到Master发过来的指令通过ExecutorRunner启动另外一个进程来运行Executor。这里是指启动另外一个进程来启动Executor,而不是直接启动Executor。Master向Worker发送指令,Worker为什么启动另外一个进程,在另外一个进程中注册给Driver,然后启动Executor?因为Worker本身是管理机器上的资源的,机器上资源变动的时候要汇报给Master。Worker不是用来计算的,不能在Woker中进行计算;Spark 集群中有很多应用程序,需要很多Executor,如果不是给每一个Executor启动一个进程,而是所有的Executor都在Worker里面,如果一个程序崩溃将导致其它的程序也崩溃。
l 启动CoarseGrainedExecutorBackend,CoarseGrainedExecutorBackend是Executor所在的进程。CoarseGrainedExecutorBackend启动的时候,需向Driver侧注册。通过发送RegisterExecutor向Driver注册,注册的内容是RegisterExecutor:
CoarseGrainedExecutorBackend.scala的onstart方法源码:
1. override def onStart() {
2. logInfo("Connecting to driver: "+ driverUrl)
3. rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap{ ref =>
4. // This is a very fast action so we canuse "ThreadUtils.sameThread"
5. driver = Some(ref)
6. ref.ask[Boolean](RegisterExecutor(executorId,self, hostname, cores, extractLogUrls))
7. }(ThreadUtils.sameThread).onComplete {
8. // This is a very fast action so we canuse "ThreadUtils.sameThread"
9. case Success(msg) =>
10. // Always receive`true`. Just ignore it
11. case Failure(e) =>
12. exitExecutor(1,s"Cannot register with driver: $driverUrl", e, notifyDriver = false)
13. }(ThreadUtils.sameThread)
14. }
其中RegisterExecutor是一个case class,源码为:
1. case class RegisterExecutor(
2. executorId: String,
3. executorRef: RpcEndpointRef,
4. hostname: String,
5. cores: Int,
6. logUrls: Map[String, String])
7. extends CoarseGrainedClusterMessage
CoarseGrainedExecutorBackend启动的时候,向Driver发送RegisterExecutor消息进行注册;Driver收到RegisterExecutor消息,在Executor注册成功后会返回消息RegisteredExecutor给CoarseGrainedExecutorBackend。这里注册的Executor和真正工作的Executor没有任何关系,其实注册的是RegisterExecutorBackend,可以将RegisteredExecutor名字理解为RegisterExecutorBackend。
需要特别注意是在CoarseGrainedExecutorBackend启动时向Driver注册Executor其实质是注册ExecutorBackend实例,和Executor实例之间没有直接的关系!
l CoarseGrainedExecutorBackend是Executor运行所在的进程名称,CoarseGrainedExecutorBackend本身不会完成任务的计算;
l Executor才是正在处理Task的对象。Executor内部是通过线程池的方式来完成Task的计算的;Executor对象运行于CoarseGrainedExecutorBackend进程。
l CoarseGrainedExecutorBackend和Executor是一一对应的。
l CoarseGrainedExecutorBackend是一个消息通信体(其具体实现了ThreadSafeRPCEndpoint),可以发送信息给Driver并可以接受Driver中发过来的指令,例如启动Task等。
CoarseGrainedExecutorBackend继承至ThreadSafeRpcEndpoint,CoarseGrainedExecutorBackend是一个消息通信体,可以收消息,也可以发消息。源码如下:
1. private[spark] classCoarseGrainedExecutorBackend(
2. override val rpcEnv: RpcEnv,
3. driverUrl: String,
4. executorId: String,
5. hostname: String,
6. cores: Int,
7. userClassPath: Seq[URL],
8. env: SparkEnv)
9. extends ThreadSafeRpcEndpoint withExecutorBackend with Logging {
CoarseGrainedExecutorBackend发消息给我们的Driver,Driver在StandaloneSchedulerBackend里面(spark 2.0中已将SparkDeploySchedulerBackend更名为StandaloneSchedulerBackend),StandaloneSchedulerBackend继承至CoarseGrainedSchedulerBackend, start启动的时候启动StandaloneAppClient,StandaloneAppClient(在Spark 2.0中将AppClient更名为StandaloneAppClient),代表应用程序本身。。
StandaloneSchedulerBackend.scala的start方法源码如下:
1. overridedef start() {
2. super.start()
3. launcherBackend.connect()
4.
5. // The endpoint for executors to talk to us
6. val driverUrl = RpcEndpointAddress(
7. sc.conf.get("spark.driver.host"),
8. sc.conf.get("spark.driver.port").toInt,
9. CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
10. val args = Seq(
11. "--driver-url", driverUrl,
12. "--executor-id","{{EXECUTOR_ID}}",
13. "--hostname","{{HOSTNAME}}",
14. "--cores","{{CORES}}",
15. "--app-id","{{APP_ID}}",
16. "--worker-url","{{WORKER_URL}}")
17. val extraJavaOpts =sc.conf.getOption("spark.executor.extraJavaOptions")
18. .map(Utils.splitCommandString).getOrElse(Seq.empty)
19. val classPathEntries =sc.conf.getOption("spark.executor.extraClassPath")
20. .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
21. val libraryPathEntries =sc.conf.getOption("spark.executor.extraLibraryPath")
22. .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
23.
24. // When testing, exposethe parent class path to the child. This is processed by
25. //compute-classpath.{cmd,sh} and makes all needed jars available to childprocesses
26. // when the assembly isbuilt with the "*-provided" profiles enabled.
27. val testingClassPath =
28. if(sys.props.contains("spark.testing")) {
29. sys.props("java.class.path").split(java.io.File.pathSeparator).toSeq
30. } else {
31. Nil
32. }
33.
34. // Start executors with afew necessary configs for registering with the scheduler
35. val sparkJavaOpts =Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
36. val javaOpts =sparkJavaOpts ++ extraJavaOpts
37. val command =Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
38. args, sc.executorEnvs,classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
39. val appUIAddress =sc.ui.map(_.appUIAddress).getOrElse("")
40. val coresPerExecutor =conf.getOption("spark.executor.cores").map(_.toInt)
41. // If we're using dynamicallocation, set our initial executor limit to 0 for now.
42. //ExecutorAllocationManager will send the real initial limit to the Master later.
43. val initialExecutorLimit =
44. if(Utils.isDynamicAllocationEnabled(conf)) {
45. Some(0)
46. } else {
47. None
48. }
49. val appDesc = newApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
50. appUIAddress,sc.eventLogDir, sc.eventLogCodec, coresPerExecutor, initialExecutorLimit)
51. client = newStandaloneAppClient(sc.env.rpcEnv, masters, appDesc, this, conf)
52. client.start()
53. launcherBackend.setState(SparkAppHandle.State.SUBMITTED)
54. waitForRegistration()
55. launcherBackend.setState(SparkAppHandle.State.RUNNING)
56. }
看一下StandaloneAppClient的源码:
1. private[spark]class StandaloneAppClient(
2. rpcEnv: RpcEnv,
3. masterUrls: Array[String],
4. appDescription: ApplicationDescription,
5. listener: StandaloneAppClientListener,
6. conf: SparkConf)
7. extends Logging {
8. ……
9. private classClientEndpoint(override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint
10. with Logging {
11. ……
在Driver进程有两个至关重要的Endpoint:
l ClientEndpoint:主要负责向Master注册当前的程序,是AppClient的内部成员;
l DriverEndpoint:这是整个程序运行时候的驱动器,是CoarseGrainedExecutorBackend的内部成员;
CoarseGrainedSchedulerBackend的DriverEndpoint
1. class DriverEndpoint(override val rpcEnv:RpcEnv, sparkProperties: Seq[(String, String)])
2. extends ThreadSafeRpcEndpoint with Logging{
3.
在DriverEndpoint会接收到RegisterExecutor消息,并完成在Driver上的注册。
CoarseGrainedSchedulerBackend的RegisterExecutor源码如下:
1. case RegisterExecutor(executorId,executorRef, hostname, cores, logUrls) =>
2. if(executorDataMap.contains(executorId)) {
3. executorRef.send(RegisterExecutorFailed("Duplicateexecutor ID: " + executorId))
4. context.reply(true)
5. } else {
6. // If the executor's rpc env is notlistening for incoming connections, `hostPort`
7. // will be null, and the clientconnection should be used to contact the executor.
8. val executorAddress = if(executorRef.address != null) {
9. executorRef.address
10. } else {
11. context.senderAddress
12. }
13. logInfo(s"Registered executor$executorRef ($executorAddress) with ID $executorId")
14. addressToExecutorId(executorAddress)= executorId
15. totalCoreCount.addAndGet(cores)
16. totalRegisteredExecutors.addAndGet(1)
17. val data = newExecutorData(executorRef, executorRef.address, hostname,
18. cores, cores,logUrls)
19. // This must besynchronized because variables mutated
20. // in this block areread when requesting executors
21. CoarseGrainedSchedulerBackend.this.synchronized{
22. executorDataMap.put(executorId,data)
23. if (currentExecutorIdCounter <executorId.toInt) {
24. currentExecutorIdCounter =executorId.toInt
25. }
26. if(numPendingExecutors > 0) {
27. numPendingExecutors -= 1
28. logDebug(s"Decrementednumber of pending executors ($numPendingExecutors left)")
29. }
30. }
31. executorRef.send(RegisteredExecutor)
32. // Note: some testsexpect the reply to come after we put the executor in the map
33. context.reply(true)
34. listenerBus.post(
35. SparkListenerExecutorAdded(System.currentTimeMillis(),executorId, data))
36. makeOffers()
37. }
RegisterExecutor其中有个数据结构executorDataMap,是Key-Value的方式。
1. privateval executorDataMap = new HashMap[String, ExecutorData]
ExecutorData中的executorEndpoint是RpcEndpointRef ,ExecutorData的源码如下:
1. private[cluster] class ExecutorData(
2. val executorEndpoint: RpcEndpointRef,
3. val executorAddress: RpcAddress,
4. override val executorHost: String,
5. var freeCores: Int,
6. override val totalCores: Int,
7. override val logUrlMap: Map[String, String]
8. ) extendsExecutorInfo(executorHost, totalCores, logUrlMap)
看一下CoarseGrainedExecutorBackend.scala的RegisteredExecutor源码:
1. override def receive: PartialFunction[Any,Unit] = {
2. case RegisteredExecutor =>
3. logInfo("Successfully registeredwith driver")
4. try {
5. executor = new Executor(executorId,hostname, env, userClassPath, isLocal = false)
6. } catch {
7. case NonFatal(e) =>
8. exitExecutor(1, "Unable tocreate executor due to " + e.getMessage, e)
9.
CoarseGrainedExecutorBackend在收到RegisteredExecutor消息以后,new 出来一个Executor。
而Executor就是一个普通的类。
1. private[spark] class Executor(
2. executorId: String,
3. executorHostname: String,
4. env: SparkEnv,
5. userClassPath: Seq[URL] = Nil,
6. isLocal: Boolean = false)
7. extends Logging {
回到ExecutorData.scala,其中的RpcEndpointRef是代理句柄,代理CoarseGrainedExecutorBackend。在Driver中通过ExecutorData封装并注册ExecutorBackend的信息到Driver的内存数据结构executorMapData中:
1. private[cluster] class ExecutorData(
2. val executorEndpoint: RpcEndpointRef,
3. val executorAddress: RpcAddress,
4. override val executorHost: String,
5. var freeCores: Int,
6. override val totalCores: Int,
7. override val logUrlMap: Map[String, String]
8. ) extendsExecutorInfo(executorHost, totalCores, logUrlMap)
Executor注册消息交给了DriverEndpoint,通过DriverEndpoint写数据给我们CoarseGrainedSchedulerBackend里面的数据结构executorMapData,executorMapData是CoarseGrainedSchedulerBackend的成员,因此最终注册给了CoarseGrainedSchedulerBackend。CoarseGrainedSchedulerBackend获得Executor(其实是ExecutorBackend)的注册信息。
实际在执行的时候DriverEndpoint会把信息写入CoarseGrainedSchedulerBackend的内存数据结构executorMapData,所以说最终是注册给了CoarseGrainedSchedulerBackend,也就是说CoarseGrainedSchedulerBackend掌握了为当前程序分配的所有的ExecutorBackend进程,而在每一个ExecutorBackend进行实例中会通过Executor对象来负责具体Task的运行。在运行的时候使用synchronized关键字来保证executorMapData安全的并发写操作。
CoarseGrainedExecutorBackend收到DriverEndpoint发送过来的RegisteredExecutor消息后会启动Executor实例对象,而Executor实例对象是事实上负责真正Task计算的;
1. override def receive: PartialFunction[Any,Unit] = {
2. case RegisteredExecutor =>
3. logInfo("Successfully registeredwith driver")
4. try {
5. executor = new Executor(executorId,hostname, env, userClassPath, isLocal = false)
6. } catch {
7. case NonFatal(e) =>
8. exitExecutor(1, "Unable tocreate executor due to " + e.getMessage, e)
9. }
我们看一下Executor.scala,其中的threadPool是一个线程池,源码如下:
1. private[spark] class Executor(
2. executorId: String,
3. executorHostname: String,
4. env: SparkEnv,
5. userClassPath: Seq[URL] = Nil,
6. isLocal: Boolean = false)
7. extends Logging {
8.
9. .......
10. private val threadPool =ThreadUtils.newDaemonCachedThreadPool("Executor task launch worker")
Executor 是真正负责Task计算的;其在实例化的时候会实例化一个线程池threadPool来准备Task的计算,threadPool是一个newDaemonCachedThreadPool,newDaemonCachedThreadPool 创建线程池,线程工厂按照需要的格式new出线程。语法实现如下:
1. defnewDaemonCachedThreadPool(prefix: String): ThreadPoolExecutor = {
2. val threadFactory =namedThreadFactory(prefix)
3. Executors.newCachedThreadPool(threadFactory).asInstanceOf[ThreadPoolExecutor]
4. }
namedThreadFactory源码如下:
1. defnamedThreadFactory(prefix: String): ThreadFactory = {
2. newThreadFactoryBuilder().setDaemon(true).setNameFormat(prefix +"-%d").build()
3. }
newCachedThreadPool创建一个线程池,根据需要创建新线程,线程池中的线程可以复用,使用提供的ThreadFactory创建新线程。newCachedThreadPool源码如下:
1. public static ExecutorService newCachedThreadPool(ThreadFactorythreadFactory) {
2. return new ThreadPoolExecutor(0,Integer.MAX_VALUE,
3. 60L,TimeUnit.SECONDS,
4. newSynchronousQueue<Runnable>(),
5. threadFactory);
6. }
创建的threadPool中以多线程并发执行和线程复用的方式来高效的执行Spark发过来的Task。线程池创建好以后,接下来是等待Driver发送任务给CoarseGrainedExecutorBackend,不是直接发送给Executor,因为Executor不是一个消息循环体。
Executor具体是如何工作的?
当Driver发送过来Task的时候,其实是发送给了CoarseGrainedExecutorBackend这个RpcEndpoint,而不是直接发送给了Executor(Executor由于不是消息循环体,所以永远也无法直接接受远程发过来的信息);
CoarseGrainedExecutorBackend中LaunchTask:
1. case LaunchTask(data) =>
2. if (executor == null) {
3. exitExecutor(1, "ReceivedLaunchTask command but executor was null")
4. } else {
5. val taskDesc =ser.deserialize[TaskDescription](data.value)
6. logInfo("Got assigned task "+ taskDesc.taskId)
7. executor.launchTask(this, taskId =taskDesc.taskId, attemptNumber = taskDesc.attemptNumber,
8. taskDesc.name,taskDesc.serializedTask)
9. }
Driver向CoarseGrainedExecutorBackend发送LaunchTask,转过来交给线程池中线程去执行。先判断executor是否为空,executor为空提示错误,进程就直接退出。如果executor不为空,反序列化任务调用executor的launchTask,其中attemptNumber是任务可以重试的次数。
ExecutorBackend在收到Driver中发送过来的消息后会提供调用launchTask来交给Executor去执行:
1. deflaunchTask(
2. context: ExecutorBackend,
3. taskId: Long,
4. attemptNumber: Int,
5. taskName: String,
6. serializedTask: ByteBuffer): Unit = {
7. val tr = new TaskRunner(context, taskId =taskId, attemptNumber = attemptNumber, taskName,
8. serializedTask)
9. runningTasks.put(taskId, tr)
10. threadPool.execute(tr)
11. }
Executor.scala的launchTask接收到Task执行的命令后,首先将Task封装在TaskRunner里面,然后放入到runningTasks,runningTasks是一个简单的数据结构。
1. private val runningTasks = newConcurrentHashMap[Long, TaskRunner]
launchTask中然后交给threadPool.execute(tr),交给线程池中的线程执行任务。 TaskRunner继承至Runnable,是一个Runnable,而Runnable 是Java的对象。
1. class TaskRunner(
2. execBackend: ExecutorBackend,
3. val taskId: Long,
4. val attemptNumber: Int,
5. taskName: String,
6. serializedTask: ByteBuffer)
7. extends Runnable {
TaskRunner其实是Java中Runnable接口的具体实现,在真正工作的时候会交给线程池中的线程去运行,此时会调用run方法来执行Task。
Executor.scala中的Run方法中最终调用task.run方法:
1. override def run(): Unit = {
2. ......
3. var threwException = true
4. val value = try {
5. val res = task.run(
6. taskAttemptId = taskId,
7. attemptNumber = attemptNumber,
8. metricsSystem = env.metricsSystem)
9. threwException = false
10. res
11. } finally {
12. val releasedLocks =env.blockManager.releaseAllLocksForTask(taskId)
13.
14. ......
跟进Task.scala中run方法,在里面调用runTask:
1. final def run(
2. taskAttemptId: Long,
3. attemptNumber: Int,
4. metricsSystem: MetricsSystem): T = {
5. ……
6. try {
7. runTask(context)
8. } catch {
9. ……
TaskRunner在调用run方法的时候会调用Task的run方法,而Task的run方法会调用runTask,而实际Task有ShuffleMapTask和ResultTask。
- 第33课:Spark Executor内幕彻底解密:Executor工作原理图、ExecutorBackend注册源码解密、Executor实例化内幕、Executor具体工作内幕
- 第33课: Spark Executor内幕彻底解密:Executor工作原理图、ExecutorBackend注册源码解密、Executor实例化内幕、Executor具体工作内幕
- spark executor内幕解密
- 大数据IMF传奇行动绝密课程第33课:Spark Executor内幕彻底解密
- Spark EXecutor彻底解密
- 第26课: Spark Runtime(Driver、Masster、Worker、Executor)内幕解密
- Spark技术内幕:Executor分配详解
- Spark技术内幕:Executor分配详解
- Spark技术内幕:Executor分配详解
- 解析Spark Executor内幕,详解CoarseGrainedExecutorBackend
- Spark技术内幕: Task向Executor提交的源码解析
- Spark技术内幕: Task向Executor提交的源码解析
- 第32课:Spark Worker原理和源码剖析解密:Worker工作流程图、Worker启动Driver源码解密、Worker启动Executor源码解密等
- day26:Spark Runtime(Driver、Masster、Worker、Executor)内幕
- 第28课:彻底解密Spark Sort-Based Shuffle排序具体实现内幕和源码详解
- Executor
- Executor
- Executor
- RPC远程过程调用
- Java Socket实现基于TCP和UDP多线程通信
- pads layout 9.5笔记1-常用快捷键
- Java 编程题目 第十三题
- 从为什么 String=String 谈到 StringBuilder 和 StringBuffer
- 第33课: Spark Executor内幕彻底解密:Executor工作原理图、ExecutorBackend注册源码解密、Executor实例化内幕、Executor具体工作内幕
- t-io: 百万级TCP长连接即时通讯框架
- JNI 使用简介
- 欢迎使用CSDN-markdown编辑器
- Java 编程题目 第十四题
- Linux用户及文件权限管理
- 做题提示
- 下拉菜单的两种实现方式:CSS和JS
- YUVtoUIImage