spark core 2.0 Compression 压缩.
来源:互联网 发布:多益网络校园 编辑:程序博客网 时间:2024/05/17 08:53
spark在输出时,都会调用serializerManager的wrapForCompression(blockId, outputStream),在此方法里,先判断shouldCompress来判断是否该压缩。
private def shouldCompress(blockId: BlockId): Boolean = { blockId match { case _: ShuffleBlockId => compressShuffle case _: BroadcastBlockId => compressBroadcast case _: RDDBlockId => compressRdds case _: TempLocalBlockId => compressShuffleSpill case _: TempShuffleBlockId => compressShuffle case _ => false } }
由以下默认值可见,rdd默认是不压缩的,其它类型默认压缩。
// Whether to compress broadcast variables that are stored private[this] val compressBroadcast = conf.getBoolean("spark.broadcast.compress", true) // Whether to compress shuffle output that are stored private[this] val compressShuffle = conf.getBoolean("spark.shuffle.compress", true) // Whether to compress RDD partitions that are stored serialized private[this] val compressRdds = conf.getBoolean("spark.rdd.compress", false) // Whether to compress shuffle output temporarily spilled to disk private[this] val compressShuffleSpill = conf.getBoolean("spark.shuffle.spill.compress", true) /* The compression codec to use. Note that the "lazy" val is necessary because we want to delay * the initialization of the compression codec until it is first used. The reason is that a Spark * program could be using a user-defined codec in a third party jar, which is loaded in * Executor.updateDependencies. When the BlockManager is initialized, user level jars hasn't been * loaded yet. */ private lazy val compressionCodec: CompressionCodec = CompressionCodec.createCodec(conf)
压缩算法由ComressionCodec.createCodec(conf)来决定,由以下代码可知,默认是lz4.
def getCodecName(conf: SparkConf): String = { conf.get(configKey, DEFAULT_COMPRESSION_CODEC) } def createCodec(conf: SparkConf): CompressionCodec = { createCodec(conf, getCodecName(conf)) }
val DEFAULT_COMPRESSION_CODEC = "lz4"
1 0
- spark core 2.0 Compression 压缩.
- spark core 2.0 SortShuffleManager
- spark core 2.0 OutputCommitCoordinator
- spark core 2.0 LiveListenerBus
- spark core 2.0 JobProgressListener
- spark core 2.0 YarnClusterManager
- spark core 2.0 YarnClusterSchedulerBackend
- spark core 2.0 Executor
- spark core 2.0 MetricsConfig
- spark core 2.0 ContextCleaner
- spark core 2.0 TransportClientFactory
- spark core 2.0 DiskBlockManager
- spark core 2.0 DiskStore
- spark core 2.0 StorageMemoryPool
- spark core 2.0 ChunkedByteBufferOutputStream
- spark core 2.0 RedirectableOutputStream
- spark core 2.0 UnifiedMemoryManager
- spark core 2.0 ExecutionMemoryPool
- Codeforces Round #392 (Div. 2)
- struts 的默认action
- Leetcode 113. Path Sum II
- MD5加密
- windows10使用Minikube安装kubernetes
- spark core 2.0 Compression 压缩.
- Js大小写转换
- 关于内购发布流程
- Docker 1.13 新增功能
- Java开源:wepay-轻量的 Java 微信支付组件
- C语言再学习 -- 详解C++/C 面试题 1
- 轻松学会源码编译Vim 8.0
- spring中的context:include-filter和context:exclude-filter的区别
- 关于Android点击物理返回键退出程序的问题