第223讲：Spark Shuffle Pluggable框架ShuffleReader解析

来源：互联网发布：数据报表分析编辑：程序博客网时间：2024/06/04 19:55

ShuffleReader:具体实现Stage在读取上一个Stage结果的接口。

在reduce任务中，读取mappers中的聚合数据。

从上一个shuffleMapTask中读取想要的数据，读取的内容是Iterator，具体的读可以看它的子类。

private[spark] trait ShuffleReader[K, C] {  /** Read the combined key-values for this reduce task */  def read(): Iterator[Product2[K, C]]  /**   * Close this reader.   * TODO: Add this back when we make the ShuffleReader a developer API that others can implement   * (at which point this will likely be necessary).   */  // def stop(): Unit}

具体实现的时候shuffleReader通过MapOutputTracker获取数据的位置信息。shuffleWriter将MapStatus相关信息交给Driver，Driver中有MapOutputTracker。

之前shuffleReader的子类是HashShuffleReader ，在Release 1.6.0版本中将HashShuffleReader 更名为BlockStoreShuffleReader

  [SPARK-10704] Rename HashShuffleReader to BlockStoreShuffleReader  Josh Rosen <joshrosen@databricks.com>  2015-09-22 11:50:22 -0700  Commit: 1ca5e2e, github.com/apache/spark/pull/8825

我们看一下BlockStoreShuffleReader ，BlockStoreShuffleReader继承至ShuffleReader。

1，获取序列化器：Serializer.getSerializer(dep.serializer)
2，读取过程中，判断是否mapSideCombine。

/** * Fetches and reads the partitions in range [startPartition, endPartition) from a shuffle by * requesting them from other nodes' block stores. */private[spark] class BlockStoreShuffleReader[K, C](    handle: BaseShuffleHandle[K, _, C],    startPartition: Int,    endPartition: Int,    context: TaskContext,    blockManager: BlockManager = SparkEnv.get.blockManager,    mapOutputTracker: MapOutputTracker = SparkEnv.get.mapOutputTracker)  extends ShuffleReader[K, C] with Logging {  private val dep = handle.dependency  /** Read the combined key-values for this reduce task */  override def read(): Iterator[Product2[K, C]] = {    val blockFetcherItr = new ShuffleBlockFetcherIterator(      context,      blockManager.shuffleClient,      blockManager,      mapOutputTracker.getMapSizesByExecutorId(handle.shuffleId, startPartition, endPartition),      // Note: we use getSizeAsMb when no suffix is provided for backwards compatibility      SparkEnv.get.conf.getSizeAsMb("spark.reducer.maxSizeInFlight", "48m") * 1024 * 1024)    // Wrap the streams for compression based on configuration    val wrappedStreams = blockFetcherItr.map { case (blockId, inputStream) =>      blockManager.wrapForCompression(blockId, inputStream)    }    val ser = Serializer.getSerializer(dep.serializer)    val serializerInstance = ser.newInstance()    // Create a key/value iterator for each stream    val recordIter = wrappedStreams.flatMap { wrappedStream =>      // Note: the asKeyValueIterator below wraps a key/value iterator inside of a      // NextIterator. The NextIterator makes sure that close() is called on the      // underlying InputStream when all records have been read.      serializerInstance.deserializeStream(wrappedStream).asKeyValueIterator    }    // Update the context task metrics for each record read.    val readMetrics = context.taskMetrics.createShuffleReadMetricsForDependency()    val metricIter = CompletionIterator[(Any, Any), Iterator[(Any, Any)]](      recordIter.map(record => {        readMetrics.incRecordsRead(1)        record      }),      context.taskMetrics().updateShuffleReadMetrics())    // An interruptible iterator must be used here in order to support task cancellation    val interruptibleIter = new InterruptibleIterator[(Any, Any)](context, metricIter)    val aggregatedIter: Iterator[Product2[K, C]] = if (dep.aggregator.isDefined) {      if (dep.mapSideCombine) {        // We are reading values that are already combined        val combinedKeyValuesIterator = interruptibleIter.asInstanceOf[Iterator[(K, C)]]        dep.aggregator.get.combineCombinersByKey(combinedKeyValuesIterator, context)      } else {        // We don't know the value type, but also don't care -- the dependency *should*        // have made sure its compatible w/ this aggregator, which will convert the value        // type to the combined type C        val keyValuesIterator = interruptibleIter.asInstanceOf[Iterator[(K, Nothing)]]        dep.aggregator.get.combineValuesByKey(keyValuesIterator, context)      }    } else {      require(!dep.mapSideCombine, "Map-side combine without Aggregator specified!")      interruptibleIter.asInstanceOf[Iterator[Product2[K, C]]]    }    // Sort the output if there is a sort ordering defined.    dep.keyOrdering match {      case Some(keyOrd: Ordering[K]) =>        // Create an ExternalSorter to sort the data. Note that if spark.shuffle.spill is disabled,        // the ExternalSorter won't spill to disk.        val sorter =          new ExternalSorter[K, C, C](context, ordering = Some(keyOrd), serializer = Some(ser))        sorter.insertAll(aggregatedIter)        context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled)        context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled)        context.internalMetricsToAccumulators(          InternalAccumulator.PEAK_EXECUTION_MEMORY).add(sorter.peakMemoryUsedBytes)        CompletionIterator[Product2[K, C], Iterator[Product2[K, C]]](sorter.iterator, sorter.stop())      case None =>        aggregatedIter    }  }}

shuffleReader通过MapOutputTracker获取数据的位置信息以后，2种情况：

1，如果是本地，通过BlockManager的getBlockData方法获取本地数据
2，如果是远程，数据可能在远程remote。

BlockManager的getBlockData方法获取本地数据

/**   * Interface to get local block data. Throws an exception if the block cannot be found or   * cannot be read successfully.   */  override def getBlockData(blockId: BlockId): ManagedBuffer = {    if (blockId.isShuffle) {      shuffleManager.shuffleBlockResolver.getBlockData(blockId.asInstanceOf[ShuffleBlockId])    } else {      val blockBytesOpt = doGetLocal(blockId, asBlockResult = false)        .asInstanceOf[Option[ByteBuffer]]      if (blockBytesOpt.isDefined) {        val buffer = blockBytesOpt.get        new NioManagedBuffer(buffer)      } else {        throw new BlockNotFoundException(blockId.toString)      }    }  }

0 0

第223讲：Spark Shuffle Pluggable框架ShuffleReader解析

ShuffleReader:具体实现Stage在读取上一个Stage结果的接口。

1，如果是本地，通过BlockManager的getBlockData方法获取本地数据2，如果是远程， 数据可能在远程remote。

BlockManager的getBlockData方法获取本地数据

1，如果是本地，通过BlockManager的getBlockData方法获取本地数据
2，如果是远程，数据可能在远程remote。