spark资源调度分配

来源:互联网 发布:js 仿京东楼层特效 编辑:程序博客网 时间:2024/05/29 12:39

一.任务调度与资源调度的区别
1.任务调度是通过DAGScheduler、TaskScheduler、SchedulerBackend等进行的作业调度
2.资源调度是指应用程序如何获得资源
3.任务调度是在资源调度的基础上进行的,没有资源调度那么任务调度就成了无源之水
二.资源调度内幕
1.因为Master负责资源管理和调度,所以资源调度的方法scheduler位于Master.scala这个类中,当注册程序或者资源发生变化的时候都会导致Scheduler的调用,例如注册程序的时候:

case RegisterApplication(description, driver) => {  // TODO Prevent repeated registrations from some driver  if (state == RecoveryState.STANDBY) {    // ignore, don't send response  } else {    logInfo("Registering app " + description.name)    val app = createApplication(description, driver)    registerApplication(app)    logInfo("Registered app " + description.name + " with ID " + app.id)    persistenceEngine.addApplication(app)    driver.send(RegisteredApplication(app.id, self))    schedule()  }}

2.scheduler调用的时机:每次有新的应用程序提交或者集群资源状况发生改变的时候(包括Executor增加或减少、worker增加或减少)

/** * Schedule the currently available resources among waiting apps. This method will be called * every time a new app joins or resource availability changes. */private def schedule(): Unit = {  if (state != RecoveryState.ALIVE) { //当前Master必须是Alive的方式采用资源调度,如果不是Alive的状态直接返回    return  }  // Drivers take strict precedence over executors  val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))  val numWorkersAlive = shuffledAliveWorkers.size  var curPos = 0  for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers    // We assign workers to each waiting driver in a round-robin fashion. For each driver, we    // start from the last worker that was assigned a driver, and continue onwards until we have    // explored all alive workers.    var launched = false    var numWorkersVisited = 0    while (numWorkersVisited < numWorkersAlive && !launched) {      val worker = shuffledAliveWorkers(curPos)      numWorkersVisited += 1      if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {        launchDriver(worker, driver)        waitingDrivers -= driver        launched = true      }      curPos = (curPos + 1) % numWorkersAlive    }  }  startExecutorsOnWorkers() //在worker节点启动Executor,见第八点}

3.if (state != RecoveryState.ALIVE) { //当前Master必须是Alive的方式采用资源调度,如果不是Alive的状态直接返回
return
}
4.使用Random.shuffle把Master中保留的集群中所有Worker的信息打乱,其算法内部是循环随机交换所有Worker在Master缓冲数据结构中的位置
5.接下来要判断所有worker中哪些是Alive级别的Worker,Alive才能参见资源的分配工作:

// Drivers take strict precedence over executorsval shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))

6.当sparkSubmit指定Driver在Cluster模式的情况下,此时Driver会加入waitingDriver等待队列中,在每个的DriverDescription中有要启动Driver时候对Worker的内存及Cores的要求等内容

private[deploy] case class DriverDescription(    jarUrl: String,    mem: Int,    cores: Int,    supervise: Boolean,    command: Command) {  override def toString: String = s"DriverDescription (${command.mainClass})"}

在符合资源要求的情况下然后采取随机打乱后的一个Worker来启动Driver

private def launchDriver(worker: WorkerInfo, driver: DriverInfo) {  logInfo("Launching driver " + driver.id + " on worker " + worker.id)  worker.addDriver(driver)  //worker添加Driver  driver.worker = Some(worker)  //worker和Driver相互记录  worker.endpoint.send(LaunchDriver(driver.id, driver.desc))  //Master发指令给Worker  driver.state = DriverState.RUNNING  //标志Driver的状态为RUNING}

Master发指令给Worker,让远程的Worker启动Driver

worker.endpoint.send(LaunchDriver(driver.id, driver.desc))

7.先启动Driver才会发生后续的一切的资源调度的模式。
8.Spark默认为应用程序启动Executor的方式是FIFO的方式,也就是所有提交的应用程序都是放在调度的等待队列中,先进先出,只有满足了前面应用程序的资源分配的基础上才能满足下一个应用程序资源的分配;

/** * Schedule and launch executors on workers */private def startExecutorsOnWorkers(): Unit = {  // Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app  // in the queue, then the second app, etc.  for (app <- waitingApps if app.coresLeft > 0) {    val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor    // Filter out workers that don't have enough resources to launch an executor    val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)      .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&        worker.coresFree >= coresPerExecutor.getOrElse(1))      .sortBy(_.coresFree).reverse    val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)    // Now that we've decided how many cores to allocate on each worker, let's allocate them    for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {      allocateWorkerResourceToExecutors(        app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))    }  }}

9.为应用程序具体分配Executor之前要判断应用程序是否还需要分配Core,如果不需要则不会为应用程序分配Executor

for (app <- waitingApps if app.coresLeft > 0) {

10.具体分配Executor之前要对要求Worker必须是Alive的状态必须满足Application对每个Executor的内存和Cores的要求,并且在此基础上进行降序排序,产生计算资源由大到小的usableWorker数据结构

val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)  .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&    worker.coresFree >= coresPerExecutor.getOrElse(1))  .sortBy(_.coresFree).reverse

在FIFO的情况下是spreadOutApps来让应用程序尽可能多的运行在所有的Node上。

// As a temporary workaround before better ways of configuring memory, we allow users to set// a flag that will perform round-robin scheduling across the nodes (spreading out each app// among all the nodes) instead of trying to consolidate each app onto a small # of nodes.private val spreadOutApps = conf.getBoolean("spark.deploy.spreadOut", true)

11.为应用程序分配Executor有两种方式,第一种方式是尽可能在集群的所有Worker上分配Executor,这种方式往往会带来潜在的更好的数据本地性;
12.具体在集群上分配cores的时候会尽可能的满足我们的要求:

var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)

13.如果是每个Worker下面只能够为当前的应用程序分配一个Executor的话,每次是分配一个core

// If we are launching one executor per worker, then every iteration assigns 1 core// to the executor. Otherwise, every iteration assigns cores to a new executor.if (oneExecutorPerWorker) {  assignedExecutors(pos) = 1} else {  assignedExecutors(pos) += 1}

14.准备具体为当前应用程序分配的Executor信息后,Master要通过远程通信发指令给worker来具体启动ExecutorBackend进程:

private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {  logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)  worker.addExecutor(exec)  worker.endpoint.send(LaunchExecutor(masterUrl,    exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))  exec.application.driver.send(    ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))}
worker.endpoint.send(LaunchExecutor(masterUrl,  exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))

15.紧接着给我们应用程序的Driver发送一个ExecutorAdded的信息:

exec.application.driver.send(  ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))

先分析到这里,完毕!

原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 苹果笔记本系统密码忘记了怎么办 qq加好友频繁了怎么办 淘宝买食品有问题怎么办 手机的设置图标没有了怎么办 国家创业贷款还不了会怎么办 手机mac显示:不好使.怎么办? 英雄联盟买皮肤重复怎么办 皮肤很油毛孔又粗怎么办 笔记本电脑玩英雄联盟卡怎么办 win10系统更新不动了怎么办 win7任务栏时间没了怎么办 win10桌面图标都没了怎么办 win10软件图标没了怎么办 电脑内存插板没用了怎么办 win10笔记本开不了机怎么办 cad复制东西变卡怎么办 企业网银证书过期怎么办 游戏更新网页无法正常打开怎么办 网页游戏打开说脚本错误怎么办 电脑玩游戏出现闪屏怎么办 玩游戏时出现窗口化怎么办 玩游戏时出现输入不支持怎么办 电脑玩游戏出现蓝屏怎么办 谷歌商店网页版进不去怎么办 谷歌商店为什么打不开怎么办 玩lol突然卡顿怎么办 手机上路由器管理页面打不开怎么办 苹果电脑开机页面密码打不开怎么办 逆战活动页面打不开怎么办 电脑玩lol网络卡怎么办 ios11.4qq闪退怎么办 ios11.3qq闪退怎么办 英雄联盟进入游戏界面黑屏怎么办 英雄联盟经常未响应怎么办 英雄联盟总是无响应怎么办 英雄联盟新客户端太卡怎么办 win10英雄联盟fps低怎么办 lol登游戏闪退怎么办 lol读取界面很慢怎么办 玩lol卡死黑屏怎么办 lol黑屏退不出来怎么办