Spark源码学习(5)——Storage
来源:互联网 发布:java 锁修饰静态方法 编辑:程序博客网 时间:2024/04/29 12:29
本文要解决的问题:
主要研究Spark的存储模块,通过阅读源码,对分布式存储有更深的理解。
BlockManager
Storagef模块主要分为两层:
1):负责向BlockManagerMaster上报block信息,master与slave之间的信息传递通过m-s的模式传递
2):数据层 负责存储和读取信息,主要在disk、memory、tachyon上
通常RDD的数据存放在分区中,而cache的数据一般都是block中。所以BlockManager管理着所有的Block。
BlockManeger的构造成员:
private[spark] class BlockManager( executorId: String, rpcEnv: RpcEnv, val master: BlockManagerMaster, serializerManager: SerializerManager, val conf: SparkConf, memoryManager: MemoryManager, mapOutputTracker: MapOutputTracker, shuffleManager: ShuffleManager, val blockTransferService: BlockTransferService, securityManager: SecurityManager, numUsableCores: Int) extends BlockDataManager with BlockEvictionHandler with Logging
初始化方法,有两个:1、注册BlockManagerMaster,2、启动BlockManagerWorker
def initialize(appId: String): Unit = { blockTransferService.init(this) shuffleClient.init(appId) blockManagerId = BlockManagerId( executorId, blockTransferService.hostName, blockTransferService.port) shuffleServerId = if (externalShuffleServiceEnabled) { logInfo(s"external shuffle service port = $externalShuffleServicePort") BlockManagerId(executorId, blockTransferService.hostName, externalShuffleServicePort) } else { blockManagerId }
Register BlockManger
注册BlockManager就是发送一个单向消息给Master Actor。而这个消息参数则是一个RegisterBlockManager对象,实例化这个对象需要设置几个属性:id, maxMemSize, slaveActor
/** Register the BlockManager's id with the driver. */ def registerBlockManager( blockManagerId: BlockManagerId, maxMemSize: Long, slaveEndpoint: RpcEndpointRef): Unit = { logInfo("Trying to register BlockManager") tell(RegisterBlockManager(blockManagerId, maxMemSize, slaveEndpoint)) logInfo("Registered BlockManager") }
这里需要查看tell方法。源码如下:
/** Send a one-way message to the master endpoint, to which we expect it to reply with true. */ //发送一个单向消息给Master actor private def tell(message: Any) { if (!driverEndpoint.askWithRetry[Boolean](message)) { throw new SparkException("BlockManagerMasterEndpoint returned false, expected true.") } }
askWithRetry方法,源码如下:
def askWithRetry[T: ClassTag](message: Any, timeout: RpcTimeout): T = { // TODO: Consider removing multiple attempts var attempts = 0 var lastException: Exception = null while (attempts < maxRetries) { attempts += 1 try { val future = ask[T](message, timeout) val result = timeout.awaitResult(future) if (result == null) { throw new SparkException("RpcEndpoint returned null") } return result } catch { case ie: InterruptedException => throw ie case e: Exception => lastException = e logWarning(s"Error sending message [message = $message] in $attempts attempts", e) } if (attempts < maxRetries) { Thread.sleep(retryWaitMs) } } throw new SparkException( s"Error sending message [message = $message]", lastException) }}
Stroage
Spark在存储上主要提供了两种方案:
一种是memoryStore,另一种是diskStore。
private[spark] val memoryStore = new MemoryStore(conf, blockInfoManager, serializerManager, memoryManager, this) private[spark] val diskStore = new DiskStore(conf, diskBlockManager) memoryManager.setMemoryStore(memoryStore)
MemoryStore
基于内存的存储有两种一种是基于反序列化java对象数组和序列化的ByteBuffers。在MemoyStore中维护了一个LinkedHashMap对象,它是以blockID和MemoryEntry的K/V存储。
private valentries = new LinkedHashMap[BlockId, MemoryEntry](32, 0.75f, true)
存储方法主要包括两种putBytes、putArray。在具体存储的时候需要根据StorageLevel的序列化属性deserialized,对数据做不同的操作。当内存不够的时候,默认是写磁盘的。
def putBytes[T: ClassTag]( blockId: BlockId, size: Long, memoryMode: MemoryMode, _bytes: () => ChunkedByteBuffer): Boolean = { require(!contains(blockId), s"Block $blockId is already present in the MemoryStore") if (memoryManager.acquireStorageMemory(blockId, size, memoryMode)) { // We acquired enough memory for the block, so go ahead and put it val bytes = _bytes() assert(bytes.size == size) val entry = new SerializedMemoryEntry[T](bytes, memoryMode, implicitly[ClassTag[T]]) entries.synchronized { entries.put(blockId, entry) } logInfo("Block %s stored as bytes in memory (estimated size %s, free %s)".format( blockId, Utils.bytesToString(size), Utils.bytesToString(maxMemory - blocksMemoryUsed))) true } else { false } }
再看一下doPutIterator方法
将Block存储在内存中,如果内存不够,则下滑至磁盘。
private def doPutIterator[T]( blockId: BlockId, iterator: () => Iterator[T], level: StorageLevel, classTag: ClassTag[T], tellMaster: Boolean = true, keepReadLock: Boolean = false): Option[PartiallyUnrolledIterator[T]] = { doPut(blockId, level, classTag, tellMaster = tellMaster, keepReadLock = keepReadLock) { info => val startTimeMs = System.currentTimeMillis var iteratorFromFailedMemoryStorePut: Option[PartiallyUnrolledIterator[T]] = None // Size of the block in bytes var size = 0L if (level.useMemory) { // Put it in memory first, even if it also has useDisk set to true; // We will drop it to disk later if the memory store can't hold it. if (level.deserialized) { memoryStore.putIteratorAsValues(blockId, iterator(), classTag) match { case Right(s) => size = s case Left(iter) => // Not enough space to unroll this block; drop to disk if applicable if (level.useDisk) { logWarning(s"Persisting block $blockId to disk instead.") diskStore.put(blockId) { fileOutputStream => serializerManager.dataSerializeStream(blockId, fileOutputStream, iter)(classTag) } size = diskStore.getSize(blockId) } else { iteratorFromFailedMemoryStorePut = Some(iter) } } } else { // !level.deserialized memoryStore.putIteratorAsBytes(blockId, iterator(), classTag, level.memoryMode) match { case Right(s) => size = s case Left(partiallySerializedValues) => // Not enough space to unroll this block; drop to disk if applicable if (level.useDisk) { logWarning(s"Persisting block $blockId to disk instead.") diskStore.put(blockId) { fileOutputStream => partiallySerializedValues.finishWritingToStream(fileOutputStream) } size = diskStore.getSize(blockId) } else { iteratorFromFailedMemoryStorePut = Some(partiallySerializedValues.valuesIterator) } } } } else if (level.useDisk) { diskStore.put(blockId) { fileOutputStream => serializerManager.dataSerializeStream(blockId, fileOutputStream, iterator())(classTag) } size = diskStore.getSize(blockId) } val putBlockStatus = getCurrentBlockStatus(blockId, info) val blockWasSuccessfullyStored = putBlockStatus.storageLevel.isValid if (blockWasSuccessfullyStored) { // Now that the block is in either the memory, externalBlockStore, or disk store, // tell the master about it. info.size = size if (tellMaster) { reportBlockStatus(blockId, info, putBlockStatus) } Option(TaskContext.get()).foreach { c => c.taskMetrics().incUpdatedBlockStatuses(blockId -> putBlockStatus) } logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs))) if (level.replication > 1) { val remoteStartTime = System.currentTimeMillis val bytesToReplicate = doGetLocalBytes(blockId, info) try { replicate(blockId, bytesToReplicate, level, classTag) } finally { bytesToReplicate.dispose() } logDebug("Put block %s remotely took %s" .format(blockId, Utils.getUsedTimeMs(remoteStartTime))) } } assert(blockWasSuccessfullyStored == iteratorFromFailedMemoryStorePut.isEmpty) iteratorFromFailedMemoryStorePut } }
这里不再一一介绍方法。
RDD API
在Shuffle过程中,为保持数据容错或者结构数据再次利用。RDD提供了cache、persist来存储数据,在源码中可以看出其实cache就是调用了persist。
StorageLevel
StotageLevel标志着Spark Storage的存储级别。它的存储介质主要包括Disk、Memory、OffHeap。另外还有deserialized标志数据序列化操作和replication副本数。在源码中默认为1个。下面从源码中阅读StorageLevel。
CheckPoint
一般在程序运行比较长或者计算量大的情况下,需要进行CheckPoint。这样可以避免在运行中出现异常导致代价过大的问题。CheckPoint会把数据写在本地磁盘上。在进行checkpoint前的RDD数据需要进行cache。因为checkpoint的时候会移除它的所有父节点信息,那麽在第二次加载的时候就不需要重新从磁盘加载数据。
存储方面还有很多欠缺,还需要后续进一步仔细学习。
- Spark源码学习(5)——Storage
- Spark源码走读5——Storage
- Spark源码解析——Storage模块
- Spark源码系列之Spark内核——Storage模块
- spark源码分析-storage
- Spark源码学习(2)——Spark Submit
- Spark源码学习(9)——Spark On Yarn
- Spark源码学习(10)——Spark Streaming
- spark-storage模块源码分析
- Spark源码学习(1)——RDD分析
- Spark源码学习(3)——Job Runtime
- Spark源码学习(4)——Scheduler
- Spark源码学习(6)——Shuffle
- Spark源码学习(7)——Broadcast
- Spark源码学习(8)——NetWork
- spark-0.8.0源码剖析storage
- Spark源码分析之-Storage模块
- Spark源码分析之-Storage模块
- java利用io流读取txt文件
- poj之旅——1222
- 栈和队列相关面试题(1)
- 利用最大熵进行阈值分割从而实现灰度图像的二值化的原理概要及OpenCV代码
- Android 反编译 -smali文件对比java文件
- Spark源码学习(5)——Storage
- THINKPHP中配置伪静态(URL重写)规则
- 【java线程池】ThreadPoolExecutor详解
- memcached学习之assoc部分
- 第15周阅读程序(1)
- 想建设一个能承受500万PV/每天的网站吗?如果计算呢?
- 将一组数据保存在字符数组中
- MJRefresh框架中使用问题(使用self,不走dealloc方法)
- 四大组件之Service(三)-Service的跨进程调用