Hadoop源码分析29 split和splitmetainfo
来源:互联网 发布:普中科技单片机论坛 编辑:程序博客网 时间:2024/06/05 06:00
输入文件:hdfs://server1:9000/user/admin/in/yellow.txt
1.splits
formatMinSplitSize:1;
minSplitSize=conf("mapred.min.split.size"):1;
minSize=Math.max(formatMinSplitSize, minSplitSize)=1;
maxSize=conf("mapred.max.split.size"):Long.MAX_VALUE;
fileLength=201000000;
blkLocations=[{0,67108864,server3,server2},
{67108864,67108864,server2,server3},
{134217728,66782272,server2,server3}];
blockSize=67108864;
splitSize=Math.max(minSize, Math.min(maxSize,blockSize)):67108864;
SPLIT_SLOP=1.1;
splits生成代码:
long bytesRemaining= length;
while(((double)bytesRemaining)/splitSize>SPLIT_SLOP){
}
if(bytesRemaining != 0){
}
splits内容:
FileSplit={file=hdfs://server1:9000/user/admin/in/yellow.txt,hosts=[server3, server2],length=67108864,start=0}
FileSplit={file=hdfs://server1:9000/user/admin/in/yellow.txt,hosts=[server3, server2],length=67108864,start=67108864}
FileSplit={file=hdfs://server1:9000/user/admin/in/yellow.txt,hosts=[server3, server2],length= 66782272,start=134217728}
splits写入文件:hdfs://server1:9000/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404200127_0001/job.split
splits文件头:
out.write(SPLIT_FILE_HEADER);//"SPL".getBytes("UTF-8")=[83, 80,76]
out.writeInt(splitVersion);
2.SplitMetaInfo
SplitMetaInfo生成代码:
SplitMetaInfo内容:
JobSplit$SplitMetaInfo={data-size : 67108864,start-offset: 7,locations:[server3,
}
JobSplit$SplitMetaInfo={data-size : 67108864,start-offset: 116,locations:[server3,
}
JobSplit$SplitMetaInfo={data-size : 66782272,start-offset: 225,locations:[server3,
}
SplitMetaInfo写入文件:hdfs://server1:9000/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404200127_0001/job.splitmetainfo
对比splits和SplitMetaInfo内容:
SplitMetaInfo的data-size即FileSplit的length,
SplitMetaInfo的locations即FileSplit的hosts,
SplitMetaInfo的start-offset意思是splits中某条FileSplit记录的起始地址。
SplitMetaInfo文件头:
out.write(JobSplit.META_SPLIT_FILE_HEADER);
WritableUtils.writeVInt(out,splitMetaInfoVersion);
WritableUtils.writeVInt(out,allSplitMetaInfo.length);
3.splits使用
在Task中,待补充
4.SplitMetaInfo使用
在JobTracker进程中,读取SplitMetaInfo,转化为TaskSplitMetaInfo:
TaskSplitMetaInfo[0]={inputDataLength=67108864,locations=[server3,server2],splitIndex=JobSplit$TaskSplitIndex{splitLocation="hdfs://server1:9000/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404200521_0001/job.split",startOffset=7
}
TaskSplitMetaInfo[1]={inputDataLength=67108864,locations=[server3,server2],splitIndex=JobSplit$TaskSplitIndex{splitLocation="hdfs://server1:9000/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404200521_0001/job.split",startOffset=116}
}
TaskSplitMetaInfo[2]={inputDataLength=66782272,locations=[server3,server2],splitIndex=JobSplit$TaskSplitIndex{splitLocation="hdfs://server1:9000/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404200521_0001/job.split",startOffset=225}
}
然后生成TaskInprogress:
其中jobFile:hdfs://server1:9000/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201404200521_0001/job.xml
splits[i]为TaskSplitMetaInfo
- Hadoop源码分析29 split和splitmetainfo
- Hadoop中split源码分析
- 将分片split的信息写入到job.split和splitmetainfo文件中
- Hadoop-2.4.1源码分析--MapReduce作业切片(Split)过程
- Split过程源码分析
- Hadoop源码分析(三)--------------job提交过程分析(3)之job的split过程
- java split 的源码分析?
- hbase region split 源码分析
- HBase Split流程源码分析
- hbase源码分析-是否split
- Java split方法源码分析
- hadoop + hbase架构和源码分析
- Hadoop之wordcount源码分析和MapReduce流程分析
- Hadoop源码分析-HDFS
- Hadoop RPC源码分析
- hadoop datanode源码分析
- hadoop datanode源码分析
- Hadoop RPC源码分析
- Hadoop源码分析24 JobTracker启动和心跳处理流程
- Hadoop源码分析25 JobInProgress 主要容器
- Hadoop源码分析26 JobTracker主要容器和线程
- Hadoop源码分析27 JobTracker空载处理心跳
- Hadoop源码分析28 JobTracker 处理JobClient请求
- Hadoop源码分析29 split和splitmetainfo
- Hadoop源码分析30 JobInProgress 的 TaskInProgress 执行情况
- Hadoop源码分析31 TaskTracke成员
- Hadoop源码分析32 TaskTracker流程
- Hadoop源码分析33 Child的主要流程
- Hadoop源码分析34 Child的Map
- Collection测试
- Hadoop源码分析35 QuickSort & HeapSort
- Hadoop源码分析36 Child的Reduce分析