第222讲:Spark Shuffle Pluggable框架ShuffleWriter解析

来源:互联网 发布:禁用3g网络 编辑:程序博客网 时间:2024/06/04 19:04

第222讲:Spark Shuffle Pluggable框架ShuffleWriter解析

ShuffleWriter是ShuffleMapTask将shuffle数据写入本地的接口。不同的shuffle有不同的实现。


 在ShuffleMapTask内部中获取shuffleWtriter实例,将数据记录写入shuffle系统  

private[spark] abstract class ShuffleWriter[K, V] {  /** Write a sequence of records to this task's output */  @throws[IOException]  def write(records: Iterator[Product2[K, V]]): Unit  /** Close this writer, passing along whether the map completed */  def stop(success: Boolean): Option[MapStatus]}


1,write方法:将一系列记录写入task任务的输出。这里的records记录是一个Iterator,每一个元素是Key-Value。Product2是一个trait。

write如果需要做聚合,我们需将数据做聚合。

2,stop:写入完成。提交返回一个 MapStatus

  def stop(success: Boolean): Option[MapStatus]}


MapStatus

通过 ShuffleMapTask 将结果返回给调度器。包括块任务运行的block地址以及为每个 reducer输出的大小,传递到reduce任务。
private[spark] sealed trait MapStatus {  /** Location where this task was run. */  def location: BlockManagerId  /**   * Estimated size for the reduce block, in bytes.   *   * If a block is non-empty, then this method MUST return a non-zero size.  This invariant is   * necessary for correctness, since block fetchers are allowed to skip zero-size blocks.   */  def getSizeForBlock(reduceId: Int): Long}


0 0
原创粉丝点击