Spark源码分析之BlockManagerMaster
来源:互联网 发布:万德数据库使用方法 编辑:程序博客网 时间:2024/05/22 06:18
主要负责整个应用程序在运行期间block元数据的管理和维护,以及向从节点发送指令执行命令。
一 核心属性
RpcEndpointRef: driverEndpointBlockManagerMasterEndpoint通信终端
Boolean isDriver: 是否在Driver
二 重要方法
2.1 从driver通信终端删除一个executor
defremoveExecutor(execId:String) {
// 向BlockManagerMasterEndpoint发送RemoveExecutor消息
tell(RemoveExecutor(execId))
logInfo("Removed "+execId + " successfully in removeExecutor")
}
2.2 向BlockManagerMasterEndpoint发送RegisterBlockManager消息
def registerBlockManager( blockManagerId: BlockManagerId, maxMemSize: Long, slaveEndpoint: RpcEndpointRef): Unit = { logInfo("Trying to register BlockManager") // 向BlockManagerMasterEndpoint发送RegisterBlockManager消息 tell(RegisterBlockManager(blockManagerId, maxMemSize, slaveEndpoint)) logInfo("Registered BlockManager")}
2.3 更新数据块信息
def updateBlockInfo( blockManagerId: BlockManagerId, blockId: BlockId, storageLevel: StorageLevel, memSize: Long, diskSize: Long, externalBlockStoreSize: Long): Boolean = { // 向BlockManagerMasterEndpoint发送UpdateBlockInfo消息,并且返回结果 val res = driverEndpoint.askWithRetry[Boolean]( UpdateBlockInfo(blockManagerId, blockId, storageLevel, memSize, diskSize, externalBlockStoreSize)) logDebug(s"Updated info of block $blockId") res}
2.4 获取某个block的所有BlockManagerId信息
def getLocations(blockId: BlockId): Seq[BlockManagerId] = { // 向BlockManagerMasterEndpoint发送GetLocations消息 driverEndpoint.askWithRetry[Seq[BlockManagerId]](GetLocations(blockId))}
2.5 获取多个block的位置信息
def getLocations(blockIds: Array[BlockId]): IndexedSeq[Seq[BlockManagerId]] = { // 向BlockManagerMasterEndpoint发送GetLocationsMultipleBlockIds消息 driverEndpoint.askWithRetry[IndexedSeq[Seq[BlockManagerId]]]( GetLocationsMultipleBlockIds(blockIds))}
2.6 检查是否存在该数据块
def contains(blockId: BlockId): Boolean = { !getLocations(blockId).isEmpty}
2.7 获取Executor的非当前BlockManagerId
def getPeers(blockManagerId: BlockManagerId): Seq[BlockManagerId] = { driverEndpoint.askWithRetry[Seq[BlockManagerId]](GetPeers(blockManagerId))}
2.8 向BlockManagerMasterEndpoint发送RemoveBlock消息删除数据块
def removeBlock(blockId: BlockId) { driverEndpoint.askWithRetry[Boolean](RemoveBlock(blockId))}
2.9 删除指定RDD的所有数据块
def removeRdd(rddId: Int, blocking: Boolean) { val future = driverEndpoint.askWithRetry[Future[Seq[Int]]](RemoveRdd(rddId)) future.onFailure { case e: Exception => logWarning(s"Failed to remove RDD $rddId - ${e.getMessage}", e) }(ThreadUtils.sameThread) if (blocking) { timeout.awaitResult(future) }}
2.10 删除指定shuffle的所有数据块
def removeShuffle(shuffleId: Int, blocking: Boolean) { val future = driverEndpoint.askWithRetry[Future[Seq[Boolean]]](RemoveShuffle(shuffleId)) future.onFailure { case e: Exception => logWarning(s"Failed to remove shuffle $shuffleId - ${e.getMessage}", e) }(ThreadUtils.sameThread) if (blocking) { timeout.awaitResult(future) }}
2.11 返回每一个block manager的内存状态def getMemoryStatus: Map[BlockManagerId, (Long, Long)] = { driverEndpoint.askWithRetry[Map[BlockManagerId, (Long, Long)]](GetMemoryStatus)}
2.12 返回存储状态
def getStorageStatus: Array[StorageStatus] = { driverEndpoint.askWithRetry[Array[StorageStatus]](GetStorageStatus)}
2.13 获取block状态
def getBlockStatus( blockId: BlockId, askSlaves: Boolean = true): Map[BlockManagerId, BlockStatus] = { val msg = GetBlockStatus(blockId, askSlaves) val response = driverEndpoint. askWithRetry[Map[BlockManagerId, Future[Option[BlockStatus]]]](msg) val (blockManagerIds, futures) = response.unzip implicit val sameThread = ThreadUtils.sameThread val cbf = implicitly[ CanBuildFrom[Iterable[Future[Option[BlockStatus]]], Option[BlockStatus], Iterable[Option[BlockStatus]]]] val blockStatus = timeout.awaitResult( Future.sequence[Option[BlockStatus], Iterable](futures)(cbf, ThreadUtils.sameThread)) if (blockStatus == null) { throw new SparkException("BlockManager returned null for BlockStatus query: " + blockId) } blockManagerIds.zip(blockStatus).flatMap { case (blockManagerId, status) => status.map { s => (blockManagerId, s) } }.toMap}
2.14 找到那些executor有缓存的block
def hasCachedBlocks(executorId: String): Boolean = { driverEndpoint.askWithRetry[Boolean](HasCachedBlocks(executorId))}
阅读全文
0 0
- Spark源码分析之BlockManagerMaster
- spark源码分析之shuffleFetcher
- Spark源码分析之Worker
- Spark源码分析之Worker
- Spark源码分析之Worker
- Spark源码分析之SparkContext
- Spark源码分析之Task
- Spark源码分析之BlockManager
- Spark源码分析之MemoryManager
- Spark源码分析之BlockStore
- Spark源码分析之TaskSetManager分析
- Spark源码分析之SchedulerBackend分析
- Spark源码分析之Executor分析
- Spark源码分析之DiskBlockMangaer分析
- Spark源码分析之cahce原理分析
- Spark源码分析之-scheduler模块
- Spark源码分析之-deploy模块
- Spark源码分析之-scheduler模块
- 机器学习算法简介
- Java 算法题编程必会技法(持续更新)
- CentOS 7.x安装mysql
- 查看oracle基本版本信息
- 欢迎使用CSDN-markdown编辑器
- Spark源码分析之BlockManagerMaster
- no bundle url present
- Linux下Beyond Compare4的破解使用
- eclipse有什么快捷键可以让字母变成大写或者小写
- luogu3811 乘法逆元
- 理解memcached的内存存储机制
- 一篇对伪共享、缓存行填充和CPU缓存讲的很透彻的文章
- /etc/fstab文件中的参数
- JavaScript垃圾回收概述