Spark-storage

来源：互联网发布：linux的dd命令详解编辑：程序博客网时间：2024/06/03 18:59

Spark-storage

@(spark)[storage]

java.nio

java的new io库，作为预备知识，需要先看一下
推荐入门, 中文翻译版

RDDInfo

utils类，描述RDD的信息

StorageLevel

/**                                                                                                                                                                      * :: DeveloperApi ::                                                                                                                                                    * Flags for controlling the storage of an RDD. Each StorageLevel records whether to use memory,                                                                         * or Tachyon, whether to drop the RDD to disk if it falls out of memory or Tachyon , whether to                                                                         * keep the data in memory in a serialized format, and whether to replicate the RDD partitions on                                                                        * multiple nodes.                                                                                                                                                       *                                                                                                                                                                       * The [[org.apache.spark.storage.StorageLevel$]] singleton object contains some static constants                                                                        * for commonly useful storage levels. To create your own storage level object, use the                                                                                  * factory method of the singleton object (`StorageLevel(...)`).                                                                                                         */                                                                                                                                                                     @DeveloperApi                                                                                                                                                           class StorageLevel private(                                                                                                                                                 private var _useDisk: Boolean,                                                                                                                                          private var _useMemory: Boolean,                                                                                                                                        private var _useOffHeap: Boolean,                                                                                                                                       private var _deserialized: Boolean,                                                                                                                                     private var _replication: Int = 1)

堆Heap是内存中动态分配对象居住的地方。如果使用new一个对象，它就被分配在堆内存上。这是相对于Stack，如果你有一个局部变量则它是位于Stack栈内存空间。

BigMemory是用来避免GC对堆的开销，从几MB或GB大。 BigMemory通过直接的ByteBuffers使用JVM进程的内存地址空间，不像其他原生Java对象接受GC管束。

EHCache(Terrcotta BigMemory)的 off-heap将你的对象从堆中脱离出来序列化，然后存储在一大块内存中，这就像它存储到磁盘上上一样，但它仍然在RAM中。对象在这种状态下不能直接使用，它们必须首先反序列化。也不受垃圾收集。序列化和反序列化会影响性能。(FST-serialization还是很快)。

实际上是用的是如下的storageLevel

  val NONE = new StorageLevel(false, false, false, false)                                                                                                                 val DISK_ONLY = new StorageLevel(true, false, false, false)                                                                                                             val DISK_ONLY_2 = new StorageLevel(true, false, false, false, 2)                                                                                                        val MEMORY_ONLY = new StorageLevel(false, true, false, true)                                                                                                            val MEMORY_ONLY_2 = new StorageLevel(false, true, false, true, 2)                                                                                                       val MEMORY_ONLY_SER = new StorageLevel(false, true, false, false)                                                                                                       val MEMORY_ONLY_SER_2 = new StorageLevel(false, true, false, false, 2)                                                                                                  val MEMORY_AND_DISK = new StorageLevel(true, true, false, true)                                                                                                         val MEMORY_AND_DISK_2 = new StorageLevel(true, true, false, true, 2)                                                                                                    val MEMORY_AND_DISK_SER = new StorageLevel(true, true, false, false)                                                                                                    val MEMORY_AND_DISK_SER_2 = new StorageLevel(true, true, false, false, 2)                                                                                               val OFF_HEAP = new StorageLevel(false, false, true, false)

BlockManagerId

/**                                                                                                                                                                      * :: DeveloperApi ::                                                                                                                                                    * This class represent an unique identifier for a BlockManager.                                                                                                         *                                                                                                                                                                       * The first 2 constructors of this class is made private to ensure that BlockManagerId objects                                                                          * can be created only using the apply method in the companion object. This allows de-duplication                                                                        * of ID objects. Also, constructor parameters are private to ensure that parameters cannot be                                                                           * modified from outside this class.                                                                                                                                     */                                                                                                                                                                     @DeveloperApi                                                                                                                                                           class BlockManagerId private (

BlockId

/**                                                                                                                                                                      * :: DeveloperApi ::                                                                                                                                                    * Identifies a particular Block of data, usually associated with a single file.                                                                                         * A Block can be uniquely identified by its filename, but each type of Block has a different                                                                            * set of keys which produce its unique name.                                                                                                                            *                                                                                                                                                                       * If your BlockId should be serializable, be sure to add it to the BlockId.apply() method.                                                                              */   实际上是用的Block类型  /** Converts a BlockId "name" String back into a BlockId. */                                                                                                            def apply(id: String) = id match {                                                                                                                                        case RDD(rddId, splitIndex) =>                                                                                                                                            RDDBlockId(rddId.toInt, splitIndex.toInt)                                                                                                                             case SHUFFLE(shuffleId, mapId, reduceId) =>                                                                                                                               ShuffleBlockId(shuffleId.toInt, mapId.toInt, reduceId.toInt)                                                                                                          case SHUFFLE_DATA(shuffleId, mapId, reduceId) =>                                                                                                                          ShuffleDataBlockId(shuffleId.toInt, mapId.toInt, reduceId.toInt)                                                                                                      case SHUFFLE_INDEX(shuffleId, mapId, reduceId) =>                                                                                                                         ShuffleIndexBlockId(shuffleId.toInt, mapId.toInt, reduceId.toInt)                                                                                                     case BROADCAST(broadcastId, field) =>                                                                                                                                     BroadcastBlockId(broadcastId.toLong, field.stripPrefix("_"))                                                                                                          case TASKRESULT(taskId) =>                                                                                                                                                TaskResultBlockId(taskId.toLong)                                                                                                                                      case STREAM(streamId, uniqueId) =>                                                                                                                                        StreamBlockId(streamId.toInt, uniqueId.toLong)                                                                                                                        case TEST(value) =>                                                                                                                                                       TestBlockId(value)                                                                                                                                                    case _ =>                                                                                                                                                                 throw new IllegalStateException("Unrecognized BlockId: " + id)                                                                                                      }

PutResult

/**                                                                                                                                                                      * Result of adding a block into a BlockStore. This case class contains a few things:                                                                                    *   (1) The estimated size of the put,                                                                                                                                  *   (2) The values put if the caller asked for them to be returned (e.g. for chaining                                                                                   *       replication), and                                                                                                                                               *   (3) A list of blocks dropped as a result of this put. This is always empty for DiskStore.                                                                           */                                                                                                                                                                     private[spark] case class PutResult(                                                                                                                                        size: Long,                                                                                                                                                             data: Either[Iterator[_], ByteBuffer],                                                                                                                                  droppedBlocks: Seq[(BlockId, BlockStatus)] = Seq.empty)

BlockManagerMessages

定义了BlockManager之间的message交互

BlockManager

/**                                                                                                                                                                      * Manager running on every node (driver and executors) which provides interfaces for putting and                                                                        * retrieving blocks both locally and remotely into various stores (memory, disk, and off-heap).                                                                         *                                                                                                                                                                       * Note that #initialize() must be called before the BlockManager is usable.                                                                                             */                                                                                                                                                                     private[spark] class BlockManager(

这个是一个蛮长的文件，在这个文件中，定义了BlockManager，目前的实现中它是sparkEnv的一个memeber。

BlockManagerMasterActor

/**                                                                                                                                                                      * BlockManagerMasterActor is an actor on the master node to track statuses of                                                                                           * all slaves' block managers.                                                                                                                                           */                                                                                                                                                                     private[spark]                                                                                                                                                          class BlockManagerMasterActor(val isLocal: Boolean, conf: SparkConf, listenerBus: LiveListenerBus)                                                                        extends Actor with ActorLogReceive with Logging {

它的核心就是一系列的HashMap

  // Mapping from block manager id to the block manager's information.                                                                                                    private val blockManagerInfo = new mutable.HashMap[BlockManagerId, BlockManagerInfo]                                                                                    // Mapping from executor ID to block manager ID.                                                                                                                        private val blockManagerIdByExecutor = new mutable.HashMap[String, BlockManagerId]                                                                                      // Mapping from block id to the set of block managers that have the block.                                                                                              private val blockLocations = new JHashMap[BlockId, mutable.HashSet[BlockManagerId]]

BlockManagerSlaveActor

/**                                                                                                                                                                      * An actor to take commands from the master to execute options. For example,                                                                                            * this is used to remove blocks from the slave's BlockManager.                                                                                                          */                                                                                                                                                                     private[storage]                                                                                                                                                        class BlockManagerSlaveActor(                                                                                                                                               blockManager: BlockManager,                                                                                                                                             mapOutputTracker: MapOutputTracker)                                                                                                                                   extends Actor with ActorLogReceive with Logging {

slave 就比较简单了，基本上就是异步的执行一些操作就可以了。

ShuffleBlockFetcherIterator

/**                                                                                                                                                                      * An iterator that fetches multiple blocks. For local blocks, it fetches from the local block                                                                           * manager. For remote blocks, it fetches them using the provided BlockTransferService.                                                                                  *                                                                                                                                                                       * This creates an iterator of (BlockID, values) tuples so the caller can handle blocks in a                                                                             * pipelined fashion as they are received.                                                                                                                               *                                                                                                                                                                       * The implementation throttles the remote fetches to they don't exceed maxBytesInFlight to avoid                                                                        * using too much memory.                                                                                                                                                *                                                                                                                                                                       * @param context [[TaskContext]], used for metrics update                                                                                                               * @param shuffleClient [[ShuffleClient]] for fetching remote blocks                                                                                                     * @param blockManager [[BlockManager]] for reading local blocks                                                                                                         * @param blocksByAddress list of blocks to fetch grouped by the [[BlockManagerId]].                                                                                     *                        For each block we also require the size (in bytes as a long field) in                                                                          *                        order to throttle the memory usage.                                                                                                            * @param serializer serializer used to deserialize the data.                                                                                                            * @param maxBytesInFlight max size (in bytes) of remote blocks to fetch at any given point.                                                                             */   private[spark]                                                                                                                                                          final class ShuffleBlockFetcherIterator(                                                                                                                                    context: TaskContext,                                                                                                                                                   shuffleClient: ShuffleClient,                                                                                                                                           blockManager: BlockManager,                                                                                                                                             blocksByAddress: Seq[(BlockManagerId, Seq[(BlockId, Long)])],                                                                                                           serializer: Serializer,                                                                                                                                                 maxBytesInFlight: Long)                                                                                                                                               extends Iterator[(BlockId, Try[Iterator[Any]])] with Logging {

介绍在Scala 中怎样使用一种函数式的方式来处理数据交互，包括入参及返回值。
Option: 解决null（空指针）问题
Either: 解决返回值不确定（返回两个值的其中一个）问题
Try: 解决函数可能会抛出异常问题

基本逻辑：
1. 分清楚local和remote节点的block
2. 向remote节点发送request
3. 在等待结果的过程中取local的block

BlockManagerMaster

通过driverActor控制blockManager

BlockStore

/**                                                                                                                                                                      * Abstract class to store blocks.                                                                                                                                       */                                                                                                                                                                     private[spark] abstract class BlockStore(val blockManager: BlockManager) extends Logging {

在BlockStore的基础上会有各种各样的store来具体负责各种资源的store。

MemoryStore

/**                                                                                                                                                                      * Stores blocks in memory, either as Arrays of deserialized Java objects or as                                                                                          * serialized ByteBuffers.                                                                                                                                               */                                                                                                                                                                     private[spark] class MemoryStore(blockManager: BlockManager, maxMemory: Long)                                                                                             extends BlockStore(blockManager) {

MemoryStore可以缓存两种东西：
1. 一个btye流，需要copy
2. Array[any]，直接缓存指针

内部实际上是个HashMap，用来缓存data；注意目前实现是有锁的。
1. 在entries上面的锁。
有两类接口：get/put

get

override def getSize(blockId: BlockId): Long = {
override def getBytes(blockId: BlockId): Option[ByteBuffer] = {
override def getValues(blockId: BlockId): Option[Iterator[Any]] = {

put

override def putBytes(blockId: BlockId, _bytes: ByteBuffer, level: StorageLevel): PutResult = {
override def putArray(
override def putIterator(

关于put的基本逻辑是：
1. 首先检测空间是不是够
- 够，缓存之
- 不够，试着释放空间
2. 被释放的空间和不能被放在memoryStore中的block会被尝试写入DiskStore

另外还有一些诸如clear，remove之类的

DiskStore

DiskBlockManager

/**                                                                                                                                                                      * Creates and maintains the logical mapping between logical blocks and physical on-disk                                                                                 * locations. By default, one block is mapped to one file with a name given by its BlockId.                                                                              * However, it is also possible to have a block map to only a segment of a file, by calling                                                                              * mapBlockToFileSegment().                                                                                                                                              *                                                                                                                                                                       * Block files are hashed among the directories listed in spark.local.dir (or in                                                                                         * SPARK_LOCAL_DIRS, if it's set).                                                                                                                                       */                                                                                                                                                                     private[spark] class DiskBlockManager(blockManager: BlockManager, conf: SparkConf)                                                                                        extends Logging {

一个mapping罢了，注意这里的disk指的是本地磁盘不是HDFS，stop的时候直接rm掉数据就可以了。

实际上DiskStore比MemoryStore还要再简单一些：
1. 都是byte流，不用像MemoryDisk一样区分Any和byte
2. 对于byte流，如果长度小就直接都读上来，否者用channel.map

TachyonStore

/**                                                                                                                                                                      * Stores BlockManager blocks on Tachyon.                                                                                                                                */                                                                                                                                                                     private[spark] class TachyonStore(                                                                                                                                          blockManager: BlockManager,                                                                                                                                             tachyonManager: TachyonBlockManager)                                                                                                                                  extends BlockStore(blockManager: BlockManager) with Logging {

比较像diskStore，在Tachyon的API上做了封装。

BlockManager

/**                                                                                                                                                                      * Manager running on every node (driver and executors) which provides interfaces for putting and                                                                        * retrieving blocks both locally and remotely into various stores (memory, disk, and off-heap).                                                                         *                                                                                                                                                                       * Note that #initialize() must be called before the BlockManager is usable.                                                                                             */                                                                                                                                                                     private[spark] class BlockManager(                                                                                                                                          executorId: String,                                                                                                                                                     actorSystem: ActorSystem,                                                                                                                                               val master: BlockManagerMaster,                                                                                                                                         defaultSerializer: Serializer,                                                                                                                                          maxMemory: Long,                                                                                                                                                        val conf: SparkConf,                                                                                                                                                    mapOutputTracker: MapOutputTracker,                                                                                                                                     shuffleManager: ShuffleManager,                                                                                                                                         blockTransferService: BlockTransferService,                                                                                                                             securityManager: SecurityManager,                                                                                                                                       numUsableCores: Int)                                                                                                                                                  extends BlockDataManager with Logging {

基本上BlockManager就是上面提到的所有东西的集合，如果能看懂上面那堆参数，那么你就比我强了。

本地拿block的逻辑

看有没有这么一个block
看MemoryStore里有没有
看TachyonStore里有没有
看DiskStore里有没有

DoPut

和DoGet几乎类似的逻辑

Replicate

  /**                                                                                                                                                                      * Replicate block to another node. Not that this is a blocking call that returns after                                                                                  * the block has been replicated.                                                                                                                                        */                                                                                                                                                                     private def replicate(blockId: BlockId, data: ByteBuffer, level: StorageLevel): Unit = {

0 0