spark core 2.0 Compression 压缩.

来源:互联网 发布:多益网络校园 编辑:程序博客网 时间:2024/05/17 08:53

spark在输出时,都会调用serializerManager的wrapForCompression(blockId, outputStream),在此方法里,先判断shouldCompress来判断是否该压缩。

  private def shouldCompress(blockId: BlockId): Boolean = {    blockId match {      case _: ShuffleBlockId => compressShuffle      case _: BroadcastBlockId => compressBroadcast      case _: RDDBlockId => compressRdds      case _: TempLocalBlockId => compressShuffleSpill      case _: TempShuffleBlockId => compressShuffle      case _ => false    }  }

由以下默认值可见,rdd默认是不压缩的,其它类型默认压缩。

  // Whether to compress broadcast variables that are stored  private[this] val compressBroadcast = conf.getBoolean("spark.broadcast.compress", true)  // Whether to compress shuffle output that are stored  private[this] val compressShuffle = conf.getBoolean("spark.shuffle.compress", true)  // Whether to compress RDD partitions that are stored serialized  private[this] val compressRdds = conf.getBoolean("spark.rdd.compress", false)  // Whether to compress shuffle output temporarily spilled to disk  private[this] val compressShuffleSpill = conf.getBoolean("spark.shuffle.spill.compress", true)  /* The compression codec to use. Note that the "lazy" val is necessary because we want to delay   * the initialization of the compression codec until it is first used. The reason is that a Spark   * program could be using a user-defined codec in a third party jar, which is loaded in   * Executor.updateDependencies. When the BlockManager is initialized, user level jars hasn't been   * loaded yet. */  private lazy val compressionCodec: CompressionCodec = CompressionCodec.createCodec(conf)

压缩算法由ComressionCodec.createCodec(conf)来决定,由以下代码可知,默认是lz4.

def getCodecName(conf: SparkConf): String = {    conf.get(configKey, DEFAULT_COMPRESSION_CODEC)  }  def createCodec(conf: SparkConf): CompressionCodec = {    createCodec(conf, getCodecName(conf))  }

val DEFAULT_COMPRESSION_CODEC = "lz4"



1 0
原创粉丝点击