Spark On YARN启动流程源码分析

来源：互联网发布：怎样做淘宝店铺推广编辑：程序博客网时间：2024/05/22 14:43

Sparn On Yarn启动流程分析

1. YarnschedulerBackend启动入口

YARN的启动是在SparkContext初始化scheduler时启动的，通过ClassLoader初始化YarnschedulerBackend和YARTaskscheduler。

    //scheduler的初始化， 调用createTaskScheduler()方法    // Create and start the scheduler    val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode)    _schedulerBackend = sched    _taskScheduler = ts    _dagScheduler = new DAGScheduler(this)    _heartbeatReceiver.ask[Boolean](TaskSchedulerIsSet)    // start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's    // constructor    _taskScheduler.start()    /**   * Create a task scheduler based on a given master URL.   * Return a 2-tuple of the scheduler backend and the task scheduler.   */   // 该方法根据master字符串进行匹配，如果是local/standalone模式，匹配响应的schedulerBackend和taskscheduler，   // 如果是yarn，则走默认形式  private def createTaskScheduler(      sc: SparkContext,      master: String,      deployMode: String): (SchedulerBackend, TaskScheduler) = {    import SparkMasterRegex._    // When running locally, don't try to re-execute tasks on failure.    val MAX_LOCAL_TASK_FAILURES = 1    master match {      case "local" =>        val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)        val backend = new LocalSchedulerBackend(sc.getConf, scheduler, 1)        scheduler.initialize(backend)        (backend, scheduler)      case LOCAL_N_REGEX(threads) =>       ...      case LOCAL_N_FAILURES_REGEX(threads, maxFailures) =>        ...      case SPARK_REGEX(sparkUrl) =>        ...      case LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) =>       ...      case masterUrl =>         // 这个方法如何实现基于classLoader调用YarnClusterManager.class的(scala语法不熟，待考证)        val cm = getClusterManager(masterUrl) match {          case Some(clusterMgr) => clusterMgr          case None => throw new SparkException("Could not parse Master URL: '" + master + "'")        }        try {          val scheduler = cm.createTaskScheduler(sc, masterUrl)          val backend = cm.createSchedulerBackend(sc, masterUrl, scheduler)          cm.initialize(scheduler, backend)          (backend, scheduler)        } catch {          case se: SparkException => throw se          case NonFatal(e) =>            throw new SparkException("External scheduler cannot be instantiated", e)        }    }  }  //getClusterManager()通过类加载，加载ExternalClusterManager类，同时过滤出可以构造出yarn类型的schedulerBackend和taskscheduler   private def getClusterManager(url: String): Option[ExternalClusterManager] = {    val loader = Utils.getContextOrSparkClassLoader    val serviceLoaders =      ServiceLoader.load(classOf[ExternalClusterManager], loader).asScala.filter(_.canCreate(url))    if (serviceLoaders.size > 1) {      throw new SparkException(        s"Multiple external cluster managers registered for the url $url: $serviceLoaders")    }    serviceLoaders.headOption  }  // createTaskScheduler()函数真正返回的schedulerBackend和taskscheduler是通过下面这个class  private[spark] class YarnClusterManager extends ExternalClusterManager{  }

2. 创建ApplicationMaster

SparkContext初始化过程中，会向YARN集群初始化Application(Master)，流程如下：

 /**   * Submit an application running our ApplicationMaster to the ResourceManager.   *   * The stable Yarn API provides a convenience method (YarnClient#createApplication) for   * creating applications and setting up the application submission context. This was not   * available in the alpha API.   */  def submitApplication(user: Option[String] = None): ApplicationId = {    var appId: ApplicationId = null    try {      launcherBackend.connect()      // Setup the credentials before doing anything else,      // so we have don't have issues at any point.      setupCredentials(user)      yarnClient.init(yarnConf)      yarnClient.start()      sparkUser = user      logInfo(s"[DEVELOP] [sparkUser:${sparkUser}] Requesting a new application " +        s"from cluster with %d NodeManagers"        .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))      // Get a new application from our RM      val newApp = yarnClient.createApplication()      val newAppResponse = newApp.getNewApplicationResponse()      appId = newAppResponse.getApplicationId()      reportLauncherState(SparkAppHandle.State.SUBMITTED)      launcherBackend.setAppId(appId.toString)      new CallerContext("CLIENT", Option(appId.toString)).setCurrentContext()      // Verify whether the cluster has enough resources for our AM      verifyClusterResources(newAppResponse)      // Set up the appropriate contexts to launch our AM      // 关键是这两个方法:      // 1. 创建ApplicationMaster ContainerLaunch上下文，将ContainerLaunch命令、jar包、java变量等环境准备完毕；      // 2. 创建Application提交至YARN的上下文，主要读取配置文件设置调用YARN接口前的上下文变量。      val containerContext = createContainerLaunchContext(newAppResponse)      val appContext = createApplicationSubmissionContext(newApp, containerContext)      // Finally, submit and monitor the application      logInfo(s"Submitting application $appId to ResourceManager")      yarnClient.submitApplication(appContext)      appId    } catch {      case e: Throwable =>        if (appId != null) {          cleanupStagingDir(appId)        }        throw e    }  }

真正Application启动是调用如下方法：

    val amClass =      if (isClusterMode) {        Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName      } else {        Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName      }

3. 启动ApplicationMaster

基于YARN-client的模式启动，所以直接跳转至org.apache.spark.deploy.yarn.ExecutorLauncher，该类也是封装在ApplicationMaseter中，顺着main()函数往下走，调用ApplicationMaster.run()函数-> runExecutorLauncher(securityMgr)

  private def runExecutorLauncher(securityMgr: SecurityManager): Unit = {    val port = sparkConf.getInt("spark.yarn.am.port", 0)    // 创建RPCEndpoint同driver交互    rpcEnv = RpcEnv.create("sparkYarnAM", Utils.localHostName, port, sparkConf, securityMgr,      clientMode = true)    val driverRef = waitForSparkDriver()    // WHY?    addAmIpFilter()    // 关键函数，向Driver注册AM    registerAM(sparkConf, rpcEnv, driverRef, sparkConf.get("spark.driver.appUIAddress", ""),      securityMgr)    // In client mode the actor will stop the reporter thread.    reporterThread.join()  }   private def registerAM(      _sparkConf: SparkConf,      _rpcEnv: RpcEnv,      driverRef: RpcEndpointRef,      uiAddress: String,      securityMgr: SecurityManager) = {    val appId = client.getAttemptId().getApplicationId().toString()    val attemptId = client.getAttemptId().getAttemptId().toString()    val historyAddress =      _sparkConf.get(HISTORY_SERVER_ADDRESS)        .map { text => SparkHadoopUtil.get.substituteHadoopVariables(text, yarnConf) }        .map { address => s"${address}${HistoryServer.UI_PATH_PREFIX}/${appId}/${attemptId}" }        .getOrElse("")    val driverUrl = RpcEndpointAddress(      _sparkConf.get("spark.driver.host"),      _sparkConf.get("spark.driver.port").toInt,      CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString    // Before we initialize the allocator, let's log the information about how executors will    // be run up front, to avoid printing this out for every single executor being launched.    // Use placeholders for information that changes such as executor IDs.    logInfo {      val executorMemory = sparkConf.get(EXECUTOR_MEMORY).toInt      val executorCores = sparkConf.get(EXECUTOR_CORES)      //  申请Executor资源（debug log）      val dummyRunner = new ExecutorRunnable(None, yarnConf, sparkConf, driverUrl, "<executorId>",        "<hostname>", executorMemory, executorCores, appId, securityMgr, localResources)      dummyRunner.launchContextDebugInfo()    }    //向RM注册driver地址    allocator = client.register(driverUrl,      driverRef,      yarnConf,      _sparkConf,      uiAddress,      historyAddress,      securityMgr,      localResources)    //申请Executor资源    allocator.allocateResources()    reporterThread = launchReporterThread()  }

调用yarn RM接口完成资源申请，同时初始化ApplicationMaster容器：

 /**   * Request resources such that, if YARN gives us all we ask for, we'll have a number of containers   * equal to maxExecutors.   *   * Deal with any containers YARN has granted to us by possibly launching executors in them.   *   * This must be synchronized because variables read in this method are mutated by other methods.   */  def allocateResources(): Unit = synchronized {    updateResourceRequests()    val progressIndicator = 0.1f    // Poll the ResourceManager. This doubles as a heartbeat if there are no pending container    // requests.    // 调用YARN接口，分配container    val allocateResponse = amClient.allocate(progressIndicator)     // 获取分配container资源状态    val allocatedContainers = allocateResponse.getAllocatedContainers()    if (allocatedContainers.size > 0) {      logInfo("Allocated containers: %d. Current executor count: %d. Cluster resources: %s."        .format(          allocatedContainers.size,          numExecutorsRunning,          allocateResponse.getAvailableResources))        // 当申请完毕资源后，处理函数：会初始化该executor环境，等待分配task              handleAllocatedContainers(allocatedContainers.asScala)    }    val completedContainers = allocateResponse.getCompletedContainersStatuses()    if (completedContainers.size > 0) {      logInfo("Completed %d containers".format(completedContainers.size))      processCompletedContainers(completedContainers.asScala)      logInfo("Finished processing %d completed containers. Current running executor count: %d."        .format(completedContainers.size, numExecutorsRunning))    }  }

继续往下走，当想RM申请完资源后，会调用ExecutorLaunch初始化Executor环境，具体如下:

/**   * Handle containers granted by the RM by launching executors on them.   *   * Due to the way the YARN allocation protocol works, certain healthy race conditions can result   * in YARN granting containers that we no longer need. In this case, we release them.   *   * Visible for testing.   */  def handleAllocatedContainers(allocatedContainers: Seq[Container]): Unit = {    val containersToUse = new ArrayBuffer[Container](allocatedContainers.size)    // Match incoming requests by host    val remainingAfterHostMatches = new ArrayBuffer[Container]    for (allocatedContainer <- allocatedContainers) {      matchContainerToRequest(allocatedContainer, allocatedContainer.getNodeId.getHost,        containersToUse, remainingAfterHostMatches)    }    // Match remaining by rack    val remainingAfterRackMatches = new ArrayBuffer[Container]    for (allocatedContainer <- remainingAfterHostMatches) {      val rack = RackResolver.resolve(conf, allocatedContainer.getNodeId.getHost).getNetworkLocation      matchContainerToRequest(allocatedContainer, rack, containersToUse,        remainingAfterRackMatches)    }    // Assign remaining that are neither node-local nor rack-local    val remainingAfterOffRackMatches = new ArrayBuffer[Container]    for (allocatedContainer <- remainingAfterRackMatches) {      matchContainerToRequest(allocatedContainer, ANY_HOST, containersToUse,        remainingAfterOffRackMatches)    }    if (!remainingAfterOffRackMatches.isEmpty) {      logDebug(s"Releasing ${remainingAfterOffRackMatches.size} unneeded containers that were " +        s"allocated to us")      for (container <- remainingAfterOffRackMatches) {        internalReleaseContainer(container)      }    }     // 以上执行为剔除不可用的container之后最终执行可以使用的Container    runAllocatedContainers(containersToUse)    logInfo("Received %d containers from YARN, launching executors on %d of them."      .format(allocatedContainers.size, containersToUse.size))  }  /**   * Launches executors in the allocated containers.   */  private def runAllocatedContainers(containersToUse: ArrayBuffer[Container]): Unit = {    for (container <- containersToUse) {      executorIdCounter += 1      val executorHostname = container.getNodeId.getHost      val containerId = container.getId      val executorId = executorIdCounter.toString      assert(container.getResource.getMemory >= resource.getMemory)      logInfo(s"Launching container $containerId on host $executorHostname")      def updateInternalState(): Unit = synchronized {        numExecutorsRunning += 1        executorIdToContainer(executorId) = container        containerIdToExecutorId(container.getId) = executorId        val containerSet = allocatedHostToContainersMap.getOrElseUpdate(executorHostname,          new HashSet[ContainerId])        containerSet += containerId        allocatedContainerToHostMap.put(containerId, executorHostname)      }      if (numExecutorsRunning < targetNumExecutors) {        if (launchContainers) {            // 将创建exector任务提交至线程池          launcherPool.execute(new Runnable {           // 真正完成executer初始化的是ExecutorRunnable()类            override def run(): Unit = {              try {                new ExecutorRunnable(                  Some(container),                  conf,                  sparkConf,                  driverUrl,                  executorId,                  executorHostname,                  executorMemory,                  executorCores,                  appAttemptId.getApplicationId.toString,                  securityMgr,                  localResources                ).run()                updateInternalState()              } catch {                case NonFatal(e) =>                  logError(s"Failed to launch executor $executorId on container $containerId", e)                  // Assigned container should be released immediately to avoid unnecessary resource                  // occupation.                  amClient.releaseAssignedContainer(containerId)              }            }          })        } else {          // For test only          updateInternalState()        }      } else {        logInfo(("Skip launching executorRunnable as runnning Excecutors count: %d " +          "reached target Executors count: %d.").format(numExecutorsRunning, targetNumExecutors))      }    }  }

4. Executor的启动

在ExecutorRunnable.run()方法中，会启动executor的执行命令，具体如下：

private def prepareCommand(): List[String] = {    // Extra options for the JVM    val javaOpts = ListBuffer[String]()    // java/spark  运行时环境变量    ....    YarnSparkHadoopUtil.addOutOfMemoryErrorArgument(javaOpts)    // executor真正的启动命令，真正调用的是`org.apache.spark.executor.CoarseGrainedExecutorBackend`    val commands = prefixEnv ++ Seq(      YarnSparkHadoopUtil.expandEnvironment(Environment.JAVA_HOME) + "/bin/java",      "-server") ++      javaOpts ++      Seq("org.apache.spark.executor.CoarseGrainedExecutorBackend",        "--driver-url", masterAddress,        "--executor-id", executorId,        "--hostname", hostname,        "--cores", executorCores.toString,        "--app-id", appId) ++      userClassPath ++      Seq(        s"1>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stdout",        s"2>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stderr")    // TODO: it would be nicer to just make sure there are no null commands here    commands.map(s => if (s == null) "null" else s).toList  }

org.apache.spark.executor.CoarseGrainedExecutorBackend的实现逻辑比较简单，在run()函数中创建了一个RPCEndPoint，等待LaunchTask(data)消息接受，接受之后，调用exector.launchTask()执行任务，执行任务的流程则是将task加入runningTasks，并调用threadPool进行execute。

二、运行结果

YARN集群的日志由于分散在多台机器上，比较分散，所以想通过日志来跟踪启动流程比较困难，但是如果集群小的话，通过这个方式来验证整个流程还是挺不错的方式。

1. ApplicationMaster日志

ApplicationMaster的执行日志，可以看到最终调用的org.apache.spark.executor.CoarseGrainedExecutorBackend 来启动executor。

17/05/05 16:54:58 INFO ApplicationMaster: Preparing Local resources17/05/05 16:54:59 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.17/05/05 16:54:59 WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby17/05/05 16:54:59 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1493803865684_0180_00000217/05/05 16:54:59 INFO SecurityManager: Changing view acls to: hzlishuming17/05/05 16:54:59 INFO SecurityManager: Changing modify acls to: hzlishuming17/05/05 16:54:59 INFO SecurityManager: Changing view acls groups to: 17/05/05 16:54:59 INFO SecurityManager: Changing modify acls groups to: 17/05/05 16:54:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hzlishuming); groups with view permissions: Set(); users  with modify permissions: Set(hzlishuming); groups with modify permissions: Set()17/05/05 16:54:59 INFO AMCredentialRenewer: Scheduling login from keytab in 61745357 millis.17/05/05 16:54:59 INFO ApplicationMaster: Waiting for Spark driver to be reachable.17/05/05 16:54:59 INFO ApplicationMaster: Driver now available: xxxx:4706517/05/05 16:54:59 INFO TransportClientFactory: Successfully created connection to /xxxx:47065 after 110 ms (0 ms spent in bootstraps)17/05/05 16:54:59 INFO ApplicationMaster$AMEndpoint: Add WebUI Filter. AddWebUIFilter(org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,Map(PROXY_HOSTS -> ....)17/05/05 16:55:00 INFO ApplicationMaster: ===============================================================================YARN executor launch context:  env:    CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*    SPARK_YARN_STAGING_DIR -> hdfs://hz-test01/user/hzlishuming/.sparkStaging/application_1493803865684_0180    SPARK_USER -> hzlishuming    SPARK_YARN_MODE -> true  command:    {{JAVA_HOME}}/bin/java \       -server \       -Xmx4096m \       '-XX:PermSize=1024m' \       '-XX:MaxPermSize=1024m' \       '-verbose:gc' \       '-XX:+PrintGCDetails' \       '-XX:+PrintGCDateStamps' \       '-XX:+PrintTenuringDistribution' \       -Djava.io.tmpdir={{PWD}}/tmp \       '-Dspark.driver.port=47065' \       -Dspark.yarn.app.container.log.dir=<LOG_DIR> \       -XX:OnOutOfMemoryError='kill %p' \       org.apache.spark.executor.CoarseGrainedExecutorBackend \       --driver-url \       spark://CoarseGrainedScheduler@....:47065 \       --executor-id \       <executorId> \       --hostname \       <hostname> \       --cores

2. Driver日志

在Driver端，注册完executor之后留下日志如下：

 433 17/05/05 16:04:59 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) () with ID 1 434 17/05/05 16:04:59 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) () with ID 2 435 17/05/05 16:04:59 INFO BlockManagerMasterEndpoint: Registering block manager xxxx with 2004.6 MB RAM, BlockManagerId(1, h, 54063, None) 436 17/05/05 16:04:59 INFO BlockManagerMasterEndpoint: Registering block manager xxxx with 2004.6 MB RAM, BlockManagerId(2, xxx, 42904, None)

3. Executor日志

executor的启动日志，可以通过SparkUI上查看，处理流程上面已经交代，执行的为 org.apache.spark.executor.CoarseGrainedExecutorBackend逻辑。

17/05/05 16:55:15 INFO MemoryStore: MemoryStore started with capacity 2004.6 MB17/05/05 16:55:16 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@xxx.35:4706517/05/05 16:55:16 INFO CoarseGrainedExecutorBackend: Successfully registered with driver17/05/05 16:55:16 INFO Executor: Starting executor ID 4 on host hadoop694.lt.163.org17/05/05 16:55:16 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40418.17/05/05 16:55:16 INFO NettyBlockTransferService: Server created on xxx:40418

1 0