第17课:Spark Streaming资源动态申请和动态控制消费速率原理剖析
来源:互联网 发布:从1688复制到淘宝店 编辑:程序博客网 时间:2024/05/16 05:14
高级特性:
1、Spark Streaming资源动态分配
2、Spark Streaming动态控制消费速率
原理剖析,动态控制消费速率其后面存在一套理论,资源动态分配也有一套理论。
先讲理论,后面讨论。Heron可能替代strom
为什么要动态资源分配和动态控制速率?
Spark默认是先分配资源,然后计算;粗粒度的分配方式,资源提前分配好,有计算任务提前分配好资源;
不好的地方:从Spark Streaming角度讲有高峰值和低峰值,如果资源分配从高峰值、低峰值考虑都有大量资源的浪费。
其实当年Spark Streaming参考了Storm的设计思想,在其基础上构建的SparkStreaming2.0x内核有
很大变化,此框架的最大好处就是和兄弟框架联手。我们考虑Spark Streaming资源分配按高峰值分配的话,就会造成预分配资源浪费,尤其是低峰值造成大量资源浪费。
Spark Streaming本身基于Spark Core的,Spark Core的核心是SparkContext对象,从SparkContext类代码的556行开始,支持资源的动态分配,
源码如下:
SparkContext
// Optionally scalenumber of executors dynamically based on workload. Exposed for testing.
val dynamicAllocationEnabled = Utils.isDynamicAllocationEnabled(_conf)
if (!dynamicAllocationEnabled && _conf.getBoolean("spark.dynamicAllocation.enabled",false)) {
logWarning("Dynamic Allocation and num executors both set, thusdynamic allocation disabled.")
}
_executorAllocationManager=
if (dynamicAllocationEnabled){
Some(new ExecutorAllocationManager(this, listenerBus, _conf))
} else {
None
}
_executorAllocationManager.foreach(_.start())
def isDynamicAllocationEnabled(conf: SparkConf): Boolean = {
conf.getBoolean("spark.dynamicAllocation.enabled",false)&&
conf.getInt("spark.executor.instances",0) == 0
}
ExecutorAllocationManager
// Listener forSpark events that impact the allocation policy
private val listener =new ExecutorAllocationListener
// Executor that handles the scheduling task.
private val executor =
ThreadUtils.newDaemonSingleThreadScheduledExecutor("spark-dynamic-executor-allocation")
// Metric source forExecutorAllocationManager to expose internal status to MetricsSystem.
val executorAllocationManagerSource =new ExecutorAllocationManagerSource
master
case RegisterApplication(description, driver)=> {
// 如是果master的状态是standby,也就是当前这个master是standby Master,不是active Master
//那么Application来请求注册,什么都不会干
if (state== RecoveryState.STANDBY) {
// ignore, don't send response
} else {
logInfo("Registering app " + description.name)
//用applicationDescrioption信息,创建ApplicationInfo
val app = createApplication(description, driver)
//注册Application,将ApplicationInfo加入缓存,将Application加入等待高度的队列-waitingApps
registerApplication(app)
logInfo("Registered app " + description.name +" with ID " + app.id)
//使用持久化引擎,将ApplicationInfo进行持久化
persistenceEngine.addApplication(app)
//反向,向sparkDeploySchedulerBackend的AppClient的ClientActor发送消息,也就是RegisterApplication
driver.send(RegisteredApplication(app.id, self))
schedule()
}
}
/**
* Schedule the currently availableresources among waiting apps. This method will be called
* every time a new app joins or resourceavailability changes.
*/
private def schedule(): Unit = {
if (state!= RecoveryState.ALIVE) {return } //判断master状态不是alive直接返回. standby master是不会进行application等资源的高度的
// Drivers take strict precedence overexecutors
// Random.shuffle原理,就是对传入的集合的元素进行随机打乱
//取出了workers中所有之前注册上来的worker,进行过滤,必须是状态是alive的worker,调用Random的shuffle方法进行随机的打乱
val shuffledWorkers= Random.shuffle(workers)//Randomization helps balance drivers
for (worker<- shuffledWorkersif worker.state == WorkerState.ALIVE) {
for (driver<- waitingDrivers) {
// 如果当前的worker的空闲内存量大于等于driver需要的内存,并且worker的空闲cpu数量,大于等于driver需要的cpu数量
if (worker.memoryFree >= driver.desc.mem &&worker.coresFree >= driver.desc.cores) {
launchDriver(worker, driver)
//将driver从waitingDrivers队列中移除
waitingDrivers -= driver
}
}
}
startExecutorsOnWorkers()
}
ExecutorAllocationManager
// Lower and upperbounds on the number of executors.
private val minNumExecutors = conf.getInt("spark.dynamicAllocation.minExecutors",0)
private val maxNumExecutors = conf.getInt("spark.dynamicAllocation.maxExecutors",
Integer.MAX_VALUE)
// How long there must be backloggedtasks for before an addition is triggered (seconds)
private val schedulerBacklogTimeoutS =conf.getTimeAsSeconds(
"spark.dynamicAllocation.schedulerBacklogTimeout","1s")
// Same as above, but used only after`schedulerBacklogTimeoutS` is exceeded
private val sustainedSchedulerBacklogTimeoutS =conf.getTimeAsSeconds(
"spark.dynamicAllocation.sustainedSchedulerBacklogTimeout",s"${schedulerBacklogTimeoutS}s")
// How long an executor must be idle forbefore it is removed (seconds)
private val executorIdleTimeoutS = conf.getTimeAsSeconds(
"spark.dynamicAllocation.executorIdleTimeout","60s")
private val cachedExecutorIdleTimeoutS = conf.getTimeAsSeconds(
"spark.dynamicAllocation.cachedExecutorIdleTimeout",s"${Integer.MAX_VALUE}s")
// During testing, the methods toactually kill and add executors are mocked out
private val testing = conf.getBoolean("spark.dynamicAllocation.testing",false)
// TODO: The default value of 1 forspark.executor.cores works right now because dynamic
// allocation is only supported for YARN and the default number of cores perexecutor in YARN is
// 1, but it might need to be attained differently for different clustermanagers
private val tasksPerExecutor =
conf.getInt("spark.executor.cores",1) / conf.getInt("spark.task.cpus",1)
validateSettings()
start
/**
* Register for scheduler callbacks todecide when to add and remove executors, and start
* the scheduling task.
*/
def start(): Unit = {
listenerBus.addListener(listener)
val scheduleTask= newRunnable() {
override def run(): Unit = {
try {
schedule()
} catch {
case ct:ControlThrowable =>
throw ct
case t:Throwable =>
logWarning(s"Uncaught exception in thread${Thread.currentThread().getName}", t)
}
}
}
executor.scheduleAtFixedRate(scheduleTask,0, intervalMillis, TimeUnit.MILLISECONDS)
}
schedule
private def schedule(): Unit = synchronized {
val now =clock.getTimeMillis
updateAndSyncNumExecutorsTarget(now)
removeTimes.retain { case (executorId, expireTime) =>
val expired= now >= expireTime
if (expired){
initializing = false
removeExecutor(executorId)
}
!expired
}
}
配置
spark.streaming.backpressure.enabled控制流动的速度(流进的速度和计算的时间),建议设置为true
- 第17课:Spark Streaming资源动态申请和动态控制消费速率原理剖析
- 第17课:Spark Streaming资源动态申请和动态控制消费速率原理剖析
- Spark定制班第17课:Spark Streaming资源动态申请和动态控制消费速率原理剖析
- 第17课:spark streming资源动态申请和动态控制消费速率原理剖析
- Spark 定制版:017~Spark Streaming资源动态申请和动态控制消费速率原理剖析
- Spark Streaming资源动态申请和动态控制消费速率原理剖析
- 第17课:SparkStreaming资源动态申请和动态控制消费速率原理剖析
- Spark学习笔记(17)Spark Streaming资源动态申请剖析
- 第6课:Spark Streaming源码解读之Job动态生成和深度思考
- 第6课:Spark Streaming源码解读之Job动态生成和深度思考
- 第6课:Spark Streaming源码解读之Job动态生成和深度思考
- 第6课:Spark Streaming源码解读之Job动态生成和深度思考
- 第6课:Spark Streaming源码解读之Job动态生成和深度思考
- Spark定制班第6课:Spark Streaming源码解读之Job动态生成和深度思考
- Spark定制班第6课:Spark Streaming源码解读之Job动态生成和深度思考
- 第51课: Spark大型项目下的Spark Streaming本质剖析
- Spark Streaming初始化和关闭源码图解(第23课)
- 第23课:Spark Streaming初始化和关闭源码图解
- Java ServerSocket 绑定随机端口
- linux select()详解( 二)-- UDP最简实例
- C++之中this指针与类的六个默认函数小结
- c++实现冒泡排序
- 对象序列化对于一个对象被多个对象共享的处理方法
- 第17课:Spark Streaming资源动态申请和动态控制消费速率原理剖析
- laravel5.2.3入门(一)
- Middle-题目83:49. Group Anagrams
- HDU 还是畅通工程
- C#在线预览文档(word,excel,pdf,txt,png)
- Java字符串截取
- 如何阅读java项目的源代码
- 欢迎使用CSDN-markdown编辑器
- Android5.0改变support中AlertDialog的样式