Shuffle的读写操作（一）

来源：互联网发布：下载手机开关机软件编辑：程序博客网时间：2024/06/05 22:54

下面是我们的ShuffleMapTask当中的runTask的方法，在这个方法当中主要是调用了我们的HashShuffleWrite当中的write方法来进行具体的写出操作

  /**   *   */  override def runTask(context: TaskContext): MapStatus = {    // Deserialize the RDD using the broadcast variable.       //反序列化的起始时间      val deserializeStartTime = System.currentTimeMillis()    // 获得反序列化器closureSerializer      val ser = SparkEnv.get.closureSerializer.newInstance()    // 调用反序列化器closureSerializer的deserialize()进行RDD和ShuffleDependency的反序列化,数据来源于taskBinary     val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](      ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)    //计算Executor进行反序列化的时间      _executorDeserializeTime = System.currentTimeMillis() - deserializeStartTime    metrics = Some(context.taskMetrics)    var writer: ShuffleWriter[Any, Any] = null    try {      //获得shuffleManager      val manager = SparkEnv.get.shuffleManager      //根据partition指定分区的Shufflea获取Shuffle Writer,shuffleHandle是shuffle ID      //partitionId表示的是当前RDD的某个partition,也就是说write操作作用于partition之上        writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)      //针对RDD中的分区partition,调用rdd的iterator()方法后,再调用writer的write()方法,写数据        writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])      //停止writer,并返回标志位       writer.stop(success = true).get    } catch {      case e: Exception =>        try {          if (writer != null) {            writer.stop(success = false)          }        } catch {          case e: Exception =>            log.debug("Could not stop writer", e)        }        throw e    }  }

下面这个代码是我们的HashShuffleWrite的写方法的代码如下：

  /**    * Write a bunch of records to this task's output    * 将一堆记录写入此任务的输出*/    /**     * 主要处理两件事:     * 1)判断是否需要进行聚合,比如<hello,1>和<hello,1>都要写入的话,那么先生成<hello,2>     *   然后再进行后续的写入工作     * 2)利用Partition函数来决定<k,val>写入哪一个文件中.     */  override def write(records: Iterator[Product2[K, V]]): Unit = {    //判断aggregator是否被定义,需要做Map端聚合操作    val iter = if (dep.aggregator.isDefined) {      if (dep.mapSideCombine) {//判断是否需要聚合,如果需要,聚合records执行map端的聚合        //汇聚工作,reducebyKey是一分为二的,一部在ShuffleMapTask中进行聚合        //另一部分在resultTask中聚合        dep.aggregator.get.combineValuesByKey(records, context)      } else {        records      }    } else {      require(!dep.mapSideCombine, "Map-side combine without Aggregator specified!")      records    }     //利用getPartition函数来决定<k,val>写入哪一个文件中.    for (elem <- iter) {     //elem是类似于<k,val>的键值对,以K为参数用partitioner计算其对应的值,      val bucketId = dep.partitioner.getPartition(elem._1)//获得该element需要写入的partitioner      //实际调用FileShuffleBlockManager.forMapTask进入数据写入      //bucketId文件名称,key elem._1,value elem._2      shuffle.writers(bucketId).write(elem._1, elem._2)    }  }

FileShuffleBlockResolver类的主要解析如下：

/** * Manages assigning disk-based block writers to shuffle tasks. Each shuffle task gets one file * per reducer (this set of files is called a ShuffleFileGroup).  * 管理分配基于磁盘的块写入器来随机播放任务,每个shuffle任务每个reducer获取一个文件(这组文件称为ShuffleFileGroup) * * As an optimization to reduce the number of physical shuffle files produced, multiple shuffle * blocks are aggregated into the same file. There is one "combined shuffle file" per reducer * per concurrently executing shuffle task. As soon as a task finishes writing to its shuffle * files, it releases them for another task.  *  * 作为减少生成的物理随机播放文件数量的优化,多个shuffle块被聚合到同一个文件中,每个并发执行随机播放任务,每个reducer有一个“组合shuffle文件”  * 一旦任务完成对其随机播放文件的写入,它将释放它们用于另一个任务。  * * Regarding the implementation of this feature, shuffle files are identified by a 3-tuple:  * 关于此功能的实现,随机播放文件由3元组标识： *   - shuffleId: The unique id given to the entire shuffle stage.给予整个洗牌阶段的唯一身份 *   - bucketId: The id of the output partition (i.e., reducer id)输出分区的id（即reducer id） *   - fileId: The unique id identifying a group of "combined shuffle files." Only one task at a *       time owns a particular fileId, and this id is returned to a pool when the task finishes.  *      识别一组“组合的shuffle文件”的唯一ID,一次只有一个任务拥有一个特定的fileId,当任务完成时,这个id返回给一个池 * Each shuffle file is then mapped to a FileSegment, which is a 3-tuple (file, offset, length) * that specifies where in a given file the actual block data is located.  * 然后将每个随机shuffle文件映射到FileSegment,FileSegment是一个3元组(文件,偏移量,长度),用于指定给定文件中实际块数据所在的位置 * * Shuffle file metadata is stored in a space-efficient manner. Rather than simply mapping * ShuffleBlockIds directly to FileSegments, each ShuffleFileGroup maintains a list of offsets for * each block stored in each file. In order to find the location of a shuffle block, we search the * files within a ShuffleFileGroups associated with the block's reducer.  *  *Shuffle文件元数据以节省空间的方式存储,而不是简单的映射ShuffleBlock直接转到FileSegments,  * 每个ShuffleFileGroup为每个文件中存储的每个块维护一个偏移量列表,为了找到混洗块的位置,  * 我们搜索与块的reducer相关联的ShuffleFileGroup中的文件。 */

上面这个类的 forMapTask方法如下

/**   *    * Get a ShuffleWriterGroup for the given map task, which will register it as complete   * when the writers are closed successfully    * 为给定的Map任务获取一个ShuffleWriterGroup,当写关闭成功时,它将注册为完整的    * mapId对应RDD的partionsID   *    */  def forMapTask(shuffleId: Int, mapId: Int, numBuckets: Int, serializer: Serializer,      writeMetrics: ShuffleWriteMetrics): ShuffleWriterGroup = {    new ShuffleWriterGroup {      shuffleStates.putIfAbsent(shuffleId, new ShuffleState(numBuckets))      private val shuffleState = shuffleStates(shuffleId)      private var fileGroup: ShuffleFileGroup = null      val openStartTime = System.nanoTime      val serializerInstance = serializer.newInstance()      //如果consolidateShuffleFiles为true,那么在一个Task中,有多少个输出的Partition就会有多少个中间文件,默认为false      val writers: Array[DiskBlockObjectWriter] = if (consolidateShuffleFiles) {        fileGroup = getUnusedFileGroup()//获取没有使用的FileGroup        Array.tabulate[DiskBlockObjectWriter](numBuckets) { bucketId =>          //mapId对应RDD的partionsID          val blockId = ShuffleBlockId(shuffleId, mapId, bucketId)          blockManager.getDiskWriter(blockId, fileGroup(bucketId), serializerInstance, bufferSize,            writeMetrics)        }      } else {        Array.tabulate[DiskBlockObjectWriter](numBuckets) { bucketId =>          //mapId对应RDD的partionsID          val blockId = ShuffleBlockId(shuffleId, mapId, bucketId)          //如果blockFile已经存在,那么删除它并打印日志          val blockFile = blockManager.diskBlockManager.getFile(blockId)                     val tmp = Utils.tempFileWith(blockFile)              //tmp也就是blockFile如果已经存在则,在后面追加数据          blockManager.getDiskWriter(blockId, tmp, serializerInstance, bufferSize, writeMetrics)        }      }      // Creating the file to write to and creating a disk writer both involve interacting with      // the disk, so should be included in the shuffle write time.      //创建要写入和创建磁盘刻录机的文件都涉及与磁盘交互,因此应该包含在shuffle写入的时间。      writeMetrics.incShuffleWriteTime(System.nanoTime - openStartTime)      override def releaseWriters(success: Boolean) {        if (consolidateShuffleFiles) {          if (success) {            val offsets = writers.map(_.fileSegment().offset)            val lengths = writers.map(_.fileSegment().length)            //mapId对应RDD的partionsID            fileGroup.recordMapOutput(mapId, offsets, lengths)          }          recycleFileGroup(fileGroup)        } else {          //mapId对应RDD的partionsID          shuffleState.completedMapTasks.add(mapId)        }      }

阅读全文

0 0