Hadoop MapReduce之ReduceTask任务执行（三）

来源：互联网发布：淘宝新店卖化妆品编辑：程序博客网时间：2024/05/16 16:02

在reduce端的文件拷贝阶段，会将数据放入内存或直接放入磁盘中，如果文件全部拷贝完再进行合并那样必然降低作业效率，所以在拷贝进行到一定阶段，数据的合并就开始了，负责该工作的有两个线程：InMemFSMergeThread和LocalFSMerger，分别针对内存和磁盘Segment的合并。
首先看内存合并线程InMemFSMergeThread的run函数

  public void run() {    LOG.info(reduceTask.getTaskID() + " Thread started: " + getName());    try {      boolean exit = false;      do {        exit = ramManager.waitForDataToMerge(); //检测是否需要合并        if (!exit) {          doInMemMerge();//执行合并        }      } while (!exit);    } catch (Exception e) {      LOG.warn(reduceTask.getTaskID() +               " Merge of the inmemory files threw an exception: "               + StringUtils.stringifyException(e));      ReduceCopier.this.mergeThrowable = e;    } catch (Throwable t) {      String msg = getTaskID() + " : Failed to merge in memory"                    + StringUtils.stringifyException(t);      reportFatalError(getTaskID(), t, msg);    }  }

下面是内存合并的条件，注释写的已经很清楚了，这里需要注意的是内存的使用量、拷贝完毕的文件数、挂起线程数，线程挂起的判断条件是用于保留map端数据的内存超过阈值，可参考ShuffleRamManager.reserve()函数

  public boolean waitForDataToMerge() throws InterruptedException {    boolean done = false;    synchronized (dataAvailable) {             // Start in-memory merge if manager has been closed or...      while (!closed             &&             // In-memory threshold exceeded and at least two segments             // have been fetched             (getPercentUsed() < maxInMemCopyPer || numClosed < 2)             &&             // More than "mapred.inmem.merge.threshold" map outputs             // have been fetched into memory             (maxInMemOutputs <= 0 || numClosed < maxInMemOutputs)             &&              // More than MAX... threads are blocked on the RamManager             // or the blocked threads are the last map outputs to be             // fetched. If numRequiredMapOutputs is zero, either             // setNumCopiedMapOutputs has not been called (no map ouputs             // have been fetched, so there is nothing to merge) or the             // last map outputs being transferred without             // contention, so a merge would be premature.             (numPendingRequests <                   numCopiers*MAX_STALLED_SHUFFLE_THREADS_FRACTION &&               (0 == numRequiredMapOutputs ||               numPendingRequests < numRequiredMapOutputs))) {        dataAvailable.wait();      }      done = closed;    }    return done;  }

这里的合并可以和map端的合并对比来看，逻辑大同小异，确定文件名、构建写入器，将segment放入合并队列中，如果有本地合并函数则先合并否则直接写入文件。

  private void doInMemMerge() throws IOException{    if (mapOutputsFilesInMemory.size() == 0) {      return;    }        //name this output file same as the name of the first file that is     //there in the current list of inmem files (this is guaranteed to    //be absent on the disk currently. So we don't overwrite a prev.     //created spill). Also we need to create the output file now since    //it is not guaranteed that this file will be present after merge    //is called (we delete empty files as soon as we see them    //in the merge method)    //figure out the mapId     TaskID mapId = mapOutputsFilesInMemory.get(0).mapId;    List<Segment<K, V>> inMemorySegments = new ArrayList<Segment<K,V>>();    long mergeOutputSize = createInMemorySegments(inMemorySegments, 0);    int noInMemorySegments = inMemorySegments.size();    Path outputPath =        mapOutputFile.getInputFileForWrite(mapId, mergeOutputSize);    Writer writer =       new Writer(conf, rfs, outputPath,                 conf.getMapOutputKeyClass(),                 conf.getMapOutputValueClass(),                 codec, null);    RawKeyValueIterator rIter = null;    try {      LOG.info("Initiating in-memory merge with " + noInMemorySegments +                " segments...");            rIter = Merger.merge(conf, rfs,                           (Class<K>)conf.getMapOutputKeyClass(),                           (Class<V>)conf.getMapOutputValueClass(),                           inMemorySegments, inMemorySegments.size(),                           new Path(reduceTask.getTaskID().toString()),                           conf.getOutputKeyComparator(), reporter,                           spilledRecordsCounter, null);            if (combinerRunner == null) {        Merger.writeFile(rIter, writer, reporter, conf);      } else {        combineCollector.setWriter(writer);        combinerRunner.combine(rIter, combineCollector);      }      writer.close();      LOG.info(reduceTask.getTaskID() +           " Merge of the " + noInMemorySegments +          " files in-memory complete." +          " Local file is " + outputPath + " of size " +           localFileSys.getFileStatus(outputPath).getLen());    } catch (Exception e) {       //make sure that we delete the ondisk file that we created       //earlier when we invoked cloneFileAttributes      localFileSys.delete(outputPath, true);      throw (IOException)new IOException              ("Intermediate merge failed").initCause(e);    }    // Note the output of the merge    FileStatus status = localFileSys.getFileStatus(outputPath);    synchronized (mapOutputFilesOnDisk) {      addToMapOutputFilesOnDisk(status);    }  }}

磁盘文件的合并与此大致相同，可以具体细节可以查看org.apache.hadoop.mapred.ReduceTask.ReduceCopier.LocalFSMerger