spark-0.8.0源码剖析storage

来源:互联网 发布:阿里健康 大数据应用 编辑:程序博客网 时间:2024/04/29 11:56

BlockManagerMasterActor和BlockManagerSlaveActor进行管理和通信



1DiskStore


此处为配置spark.local.dir中的机制,每个块为一个文件并且根据块号哈希进哪个文件夹中

 private def getFile(blockId: String): File = {
    logDebug("Getting file for block " + blockId)


    // Figure out which local directory it hashes to, and which subdirectory in that
    val hash = Utils.nonNegativeHash(blockId)
    val dirId = hash % localDirs.length
    val subDirId = (hash / localDirs.length) % subDirsPerLocalDir


  






2storagelevel可以了解数据存储 的 类型和搭配



class StorageLevel private(
    private var useDisk_ : Boolean,
    private var useMemory_ : Boolean,
    private var deserialized_ : Boolean,
    private var replication_ : Int = 1)
  extends Externalizable {


  // TODO: Also add fields for caching priority, dataset ID, and flushing.
  private def this(flags: Int, replication: Int) {
    this((flags & 4) != 0, (flags & 2) != 0, (flags & 1) != 0, replication)
  }


  def this() = this(false, true, false)  // For deserialization


  def useDisk = useDisk_
  def useMemory = useMemory_
  def deserialized = deserialized_  //是否序列化
  def replication = replication_    //副本数




4

原创粉丝点击