Spark源码学习（5）——Storage

来源：互联网发布：java 锁修饰静态方法编辑：程序博客网时间：2024/04/29 12:29

本文要解决的问题：
主要研究Spark的存储模块，通过阅读源码，对分布式存储有更深的理解。

BlockManager

Storagef模块主要分为两层：

1）：负责向BlockManagerMaster上报block信息，master与slave之间的信息传递通过m-s的模式传递

2）：数据层负责存储和读取信息，主要在disk、memory、tachyon上

通常RDD的数据存放在分区中，而cache的数据一般都是block中。所以BlockManager管理着所有的Block。

BlockManeger的构造成员：

private[spark] class BlockManager(    executorId: String,    rpcEnv: RpcEnv,    val master: BlockManagerMaster,    serializerManager: SerializerManager,    val conf: SparkConf,    memoryManager: MemoryManager,    mapOutputTracker: MapOutputTracker,    shuffleManager: ShuffleManager,    val blockTransferService: BlockTransferService,    securityManager: SecurityManager,    numUsableCores: Int)  extends BlockDataManager with BlockEvictionHandler with Logging

初始化方法，有两个：1、注册BlockManagerMaster，2、启动BlockManagerWorker

 def initialize(appId: String): Unit = {    blockTransferService.init(this)    shuffleClient.init(appId)    blockManagerId = BlockManagerId(      executorId, blockTransferService.hostName, blockTransferService.port)    shuffleServerId = if (externalShuffleServiceEnabled) {      logInfo(s"external shuffle service port = $externalShuffleServicePort")      BlockManagerId(executorId, blockTransferService.hostName, externalShuffleServicePort)    } else {      blockManagerId    }

Register BlockManger

注册BlockManager就是发送一个单向消息给Master Actor。而这个消息参数则是一个RegisterBlockManager对象，实例化这个对象需要设置几个属性：id, maxMemSize, slaveActor

/** Register the BlockManager's id with the driver. */  def registerBlockManager(      blockManagerId: BlockManagerId, maxMemSize: Long, slaveEndpoint: RpcEndpointRef): Unit = {    logInfo("Trying to register BlockManager")    tell(RegisterBlockManager(blockManagerId, maxMemSize, slaveEndpoint))    logInfo("Registered BlockManager")  }

这里需要查看tell方法。源码如下：

 /** Send a one-way message to the master endpoint, to which we expect it to reply with true. */ //发送一个单向消息给Master actor  private def tell(message: Any) {    if (!driverEndpoint.askWithRetry[Boolean](message)) {      throw new SparkException("BlockManagerMasterEndpoint returned false, expected true.")    }  }

askWithRetry方法，源码如下：

def askWithRetry[T: ClassTag](message: Any, timeout: RpcTimeout): T = {    // TODO: Consider removing multiple attempts    var attempts = 0    var lastException: Exception = null    while (attempts < maxRetries) {      attempts += 1      try {        val future = ask[T](message, timeout)        val result = timeout.awaitResult(future)        if (result == null) {          throw new SparkException("RpcEndpoint returned null")        }        return result      } catch {        case ie: InterruptedException => throw ie        case e: Exception =>          lastException = e          logWarning(s"Error sending message [message = $message] in $attempts attempts", e)      }      if (attempts < maxRetries) {        Thread.sleep(retryWaitMs)      }    }    throw new SparkException(      s"Error sending message [message = $message]", lastException)  }}

Stroage

Spark在存储上主要提供了两种方案：
一种是memoryStore，另一种是diskStore。

 private[spark] val memoryStore =    new MemoryStore(conf, blockInfoManager, serializerManager, memoryManager, this)  private[spark] val diskStore = new DiskStore(conf, diskBlockManager)  memoryManager.setMemoryStore(memoryStore)

MemoryStore

基于内存的存储有两种一种是基于反序列化java对象数组和序列化的ByteBuffers。在MemoyStore中维护了一个LinkedHashMap对象，它是以blockID和MemoryEntry的K/V存储。

private valentries = new LinkedHashMap[BlockId, MemoryEntry](32, 0.75f, true)

存储方法主要包括两种putBytes、putArray。在具体存储的时候需要根据StorageLevel的序列化属性deserialized，对数据做不同的操作。当内存不够的时候，默认是写磁盘的。

def putBytes[T: ClassTag](      blockId: BlockId,      size: Long,      memoryMode: MemoryMode,      _bytes: () => ChunkedByteBuffer): Boolean = {    require(!contains(blockId), s"Block $blockId is already present in the MemoryStore")    if (memoryManager.acquireStorageMemory(blockId, size, memoryMode)) {      // We acquired enough memory for the block, so go ahead and put it      val bytes = _bytes()      assert(bytes.size == size)      val entry = new SerializedMemoryEntry[T](bytes, memoryMode, implicitly[ClassTag[T]])      entries.synchronized {        entries.put(blockId, entry)      }      logInfo("Block %s stored as bytes in memory (estimated size %s, free %s)".format(        blockId, Utils.bytesToString(size), Utils.bytesToString(maxMemory - blocksMemoryUsed)))      true    } else {      false    }  }

再看一下doPutIterator方法
将Block存储在内存中，如果内存不够，则下滑至磁盘。

private def doPutIterator[T](      blockId: BlockId,      iterator: () => Iterator[T],      level: StorageLevel,      classTag: ClassTag[T],      tellMaster: Boolean = true,      keepReadLock: Boolean = false): Option[PartiallyUnrolledIterator[T]] = {    doPut(blockId, level, classTag, tellMaster = tellMaster, keepReadLock = keepReadLock) { info =>      val startTimeMs = System.currentTimeMillis      var iteratorFromFailedMemoryStorePut: Option[PartiallyUnrolledIterator[T]] = None      // Size of the block in bytes      var size = 0L      if (level.useMemory) {        // Put it in memory first, even if it also has useDisk set to true;        // We will drop it to disk later if the memory store can't hold it.        if (level.deserialized) {          memoryStore.putIteratorAsValues(blockId, iterator(), classTag) match {            case Right(s) =>              size = s            case Left(iter) =>              // Not enough space to unroll this block; drop to disk if applicable              if (level.useDisk) {                logWarning(s"Persisting block $blockId to disk instead.")                diskStore.put(blockId) { fileOutputStream =>                  serializerManager.dataSerializeStream(blockId, fileOutputStream, iter)(classTag)                }                size = diskStore.getSize(blockId)              } else {                iteratorFromFailedMemoryStorePut = Some(iter)              }          }        } else { // !level.deserialized          memoryStore.putIteratorAsBytes(blockId, iterator(), classTag, level.memoryMode) match {            case Right(s) =>              size = s            case Left(partiallySerializedValues) =>              // Not enough space to unroll this block; drop to disk if applicable              if (level.useDisk) {                logWarning(s"Persisting block $blockId to disk instead.")                diskStore.put(blockId) { fileOutputStream =>                  partiallySerializedValues.finishWritingToStream(fileOutputStream)                }                size = diskStore.getSize(blockId)              } else {                iteratorFromFailedMemoryStorePut = Some(partiallySerializedValues.valuesIterator)              }          }        }      } else if (level.useDisk) {        diskStore.put(blockId) { fileOutputStream =>          serializerManager.dataSerializeStream(blockId, fileOutputStream, iterator())(classTag)        }        size = diskStore.getSize(blockId)      }      val putBlockStatus = getCurrentBlockStatus(blockId, info)      val blockWasSuccessfullyStored = putBlockStatus.storageLevel.isValid      if (blockWasSuccessfullyStored) {        // Now that the block is in either the memory, externalBlockStore, or disk store,        // tell the master about it.        info.size = size        if (tellMaster) {          reportBlockStatus(blockId, info, putBlockStatus)        }        Option(TaskContext.get()).foreach { c =>          c.taskMetrics().incUpdatedBlockStatuses(blockId -> putBlockStatus)        }        logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs)))        if (level.replication > 1) {          val remoteStartTime = System.currentTimeMillis          val bytesToReplicate = doGetLocalBytes(blockId, info)          try {            replicate(blockId, bytesToReplicate, level, classTag)          } finally {            bytesToReplicate.dispose()          }          logDebug("Put block %s remotely took %s"            .format(blockId, Utils.getUsedTimeMs(remoteStartTime)))        }      }      assert(blockWasSuccessfullyStored == iteratorFromFailedMemoryStorePut.isEmpty)      iteratorFromFailedMemoryStorePut    }  }

这里不再一一介绍方法。

RDD API

在Shuffle过程中，为保持数据容错或者结构数据再次利用。RDD提供了cache、persist来存储数据，在源码中可以看出其实cache就是调用了persist。

StorageLevel

StotageLevel标志着Spark Storage的存储级别。它的存储介质主要包括Disk、Memory、OffHeap。另外还有deserialized标志数据序列化操作和replication副本数。在源码中默认为1个。下面从源码中阅读StorageLevel。

CheckPoint

一般在程序运行比较长或者计算量大的情况下，需要进行CheckPoint。这样可以避免在运行中出现异常导致代价过大的问题。CheckPoint会把数据写在本地磁盘上。在进行checkpoint前的RDD数据需要进行cache。因为checkpoint的时候会移除它的所有父节点信息，那麽在第二次加载的时候就不需要重新从磁盘加载数据。

存储方面还有很多欠缺，还需要后续进一步仔细学习。

0 0