第222讲：Spark Shuffle Pluggable框架ShuffleWriter解析

来源：互联网发布：禁用3g网络编辑：程序博客网时间：2024/06/04 19:04

ShuffleWriter是ShuffleMapTask将shuffle数据写入本地的接口。不同的shuffle有不同的实现。

在ShuffleMapTask内部中获取shuffleWtriter实例，将数据记录写入shuffle系统

private[spark] abstract class ShuffleWriter[K, V] {  /** Write a sequence of records to this task's output */  @throws[IOException]  def write(records: Iterator[Product2[K, V]]): Unit  /** Close this writer, passing along whether the map completed */  def stop(success: Boolean): Option[MapStatus]}

1，write方法：将一系列记录写入task任务的输出。这里的records记录是一个Iterator，每一个元素是Key-Value。Product2是一个trait。

write如果需要做聚合，我们需将数据做聚合。

2，stop：写入完成。提交返回一个 MapStatus

  def stop(success: Boolean): Option[MapStatus]}

MapStatus

通过 ShuffleMapTask 将结果返回给调度器。包括块任务运行的block地址以及为每个 reducer输出的大小，传递到reduce任务。

private[spark] sealed trait MapStatus {  /** Location where this task was run. */  def location: BlockManagerId  /**   * Estimated size for the reduce block, in bytes.   *   * If a block is non-empty, then this method MUST return a non-zero size.  This invariant is   * necessary for correctness, since block fetchers are allowed to skip zero-size blocks.   */  def getSizeForBlock(reduceId: Int): Long}

0 0