Hadoop MapReduce之ReduceTask任务执行(三)
来源:互联网 发布:淘宝新店卖化妆品 编辑:程序博客网 时间:2024/05/16 16:02
在reduce端的文件拷贝阶段,会将数据放入内存或直接放入磁盘中,如果文件全部拷贝完再进行合并那样必然降低作业效率,所以在拷贝进行到一定阶段,数据的合并就开始了,负责该工作的有两个线程:InMemFSMergeThread和LocalFSMerger,分别针对内存和磁盘Segment的合并。
首先看内存合并线程InMemFSMergeThread的run函数
首先看内存合并线程InMemFSMergeThread的run函数
public void run() { LOG.info(reduceTask.getTaskID() + " Thread started: " + getName()); try { boolean exit = false; do { exit = ramManager.waitForDataToMerge(); //检测是否需要合并 if (!exit) { doInMemMerge();//执行合并 } } while (!exit); } catch (Exception e) { LOG.warn(reduceTask.getTaskID() + " Merge of the inmemory files threw an exception: " + StringUtils.stringifyException(e)); ReduceCopier.this.mergeThrowable = e; } catch (Throwable t) { String msg = getTaskID() + " : Failed to merge in memory" + StringUtils.stringifyException(t); reportFatalError(getTaskID(), t, msg); } }下面是内存合并的条件,注释写的已经很清楚了,这里需要注意的是内存的使用量、拷贝完毕的文件数、挂起线程数,线程挂起的判断条件是用于保留map端数据的内存超过阈值,可参考ShuffleRamManager.reserve()函数
public boolean waitForDataToMerge() throws InterruptedException { boolean done = false; synchronized (dataAvailable) { // Start in-memory merge if manager has been closed or... while (!closed && // In-memory threshold exceeded and at least two segments // have been fetched (getPercentUsed() < maxInMemCopyPer || numClosed < 2) && // More than "mapred.inmem.merge.threshold" map outputs // have been fetched into memory (maxInMemOutputs <= 0 || numClosed < maxInMemOutputs) && // More than MAX... threads are blocked on the RamManager // or the blocked threads are the last map outputs to be // fetched. If numRequiredMapOutputs is zero, either // setNumCopiedMapOutputs has not been called (no map ouputs // have been fetched, so there is nothing to merge) or the // last map outputs being transferred without // contention, so a merge would be premature. (numPendingRequests < numCopiers*MAX_STALLED_SHUFFLE_THREADS_FRACTION && (0 == numRequiredMapOutputs || numPendingRequests < numRequiredMapOutputs))) { dataAvailable.wait(); } done = closed; } return done; }
这里的合并可以和map端的合并对比来看,逻辑大同小异,确定文件名、构建写入器,将segment放入合并队列中,如果有本地合并函数则先合并否则直接写入文件。private void doInMemMerge() throws IOException{ if (mapOutputsFilesInMemory.size() == 0) { return; } //name this output file same as the name of the first file that is //there in the current list of inmem files (this is guaranteed to //be absent on the disk currently. So we don't overwrite a prev. //created spill). Also we need to create the output file now since //it is not guaranteed that this file will be present after merge //is called (we delete empty files as soon as we see them //in the merge method) //figure out the mapId TaskID mapId = mapOutputsFilesInMemory.get(0).mapId; List<Segment<K, V>> inMemorySegments = new ArrayList<Segment<K,V>>(); long mergeOutputSize = createInMemorySegments(inMemorySegments, 0); int noInMemorySegments = inMemorySegments.size(); Path outputPath = mapOutputFile.getInputFileForWrite(mapId, mergeOutputSize); Writer writer = new Writer(conf, rfs, outputPath, conf.getMapOutputKeyClass(), conf.getMapOutputValueClass(), codec, null); RawKeyValueIterator rIter = null; try { LOG.info("Initiating in-memory merge with " + noInMemorySegments + " segments..."); rIter = Merger.merge(conf, rfs, (Class<K>)conf.getMapOutputKeyClass(), (Class<V>)conf.getMapOutputValueClass(), inMemorySegments, inMemorySegments.size(), new Path(reduceTask.getTaskID().toString()), conf.getOutputKeyComparator(), reporter, spilledRecordsCounter, null); if (combinerRunner == null) { Merger.writeFile(rIter, writer, reporter, conf); } else { combineCollector.setWriter(writer); combinerRunner.combine(rIter, combineCollector); } writer.close(); LOG.info(reduceTask.getTaskID() + " Merge of the " + noInMemorySegments + " files in-memory complete." + " Local file is " + outputPath + " of size " + localFileSys.getFileStatus(outputPath).getLen()); } catch (Exception e) { //make sure that we delete the ondisk file that we created //earlier when we invoked cloneFileAttributes localFileSys.delete(outputPath, true); throw (IOException)new IOException ("Intermediate merge failed").initCause(e); } // Note the output of the merge FileStatus status = localFileSys.getFileStatus(outputPath); synchronized (mapOutputFilesOnDisk) { addToMapOutputFilesOnDisk(status); } }}磁盘文件的合并与此大致相同,可以具体细节可以查看org.apache.hadoop.mapred.ReduceTask.ReduceCopier.LocalFSMerger
- Hadoop MapReduce之ReduceTask任务执行(三)
- Hadoop MapReduce之ReduceTask任务执行(一)
- Hadoop MapReduce之ReduceTask任务执行(二)
- Hadoop MapReduce之ReduceTask任务执行(四)
- Hadoop MapReduce之ReduceTask任务执行(五)
- Hadoop MapReduce之ReduceTask任务执行(六)
- Hadoop MapReduce之ReduceTask任务执行(二):GetMapEventsThread线程
- Hadoop MapReduce之ReduceTask任务执行(三):Merger线程分析
- Hadoop MapReduce之ReduceTask任务执行(一):远程拷贝map输出
- Hadoop MapReduce之ReduceTask任务执行(四):排序与合并
- Hadoop MapReduce之MapTask任务执行(三)
- Hadoop MapReduce之MapTask任务执行(一)
- Hadoop MapReduce之MapTask任务执行(二)
- Hadoop MapReduce之MapTask任务执行(四)
- Hadoop MapReduce之任务启动(一)
- Hadoop MapReduce之任务启动(二)
- hadoop-mapreduce中reducetask运行分析
- 精通HADOOP(九) - MAPREDUCE任务的基础知识 - 执行作业
- 【新欢与旧爱 大牌明星代言法则】
- 图片旋转缩放翻转效果
- stored to '*' during its initialization is never read
- 设置UITextField的左边距
- 脚本实现查看表空间使用情况
- Hadoop MapReduce之ReduceTask任务执行(三)
- MyBatis单个参数的动态语句引用
- iOS应用程序生命周期(前后台切换,应用的各种状态)详解
- iOS openURL不能打开网页
- 慢慢征途的第一步.....
- eclipse部署web项目至本地的tomcat但在webapps中找不到
- 食品打码机
- C++中的new与delete简单浅析
- Linux Shell高级技巧(二)