Hadoop的MapReduce函数的执行的过程
来源:互联网 发布:华为软件技术有限公司 编辑:程序博客网 时间:2024/05/16 07:33
Hadoop的MapReduce函数的执行的过程
说明
Qimage是我自定义的数据类型,作为value,所以没有用WritableComparable
QimageInputFormat<Text,Qimage>是自定义的输入格式
QimageRecordReader<Text,Qimage>是自定义的RecordReader
QimageOutputFormat<Text,Qimage>是自定义的输出格式
QimageRecordWriter<Text,Qimage>是自定义的RecordWriter
public classQimage implements Writable {
public static int count = 0;
private long fileLength;
private byte[] data;
}
其中data中存放图像数据
一、Map阶段
QimageInputFormat::createRecordReader() iscalled.
//调用QimageInputFormat中的createRecordReader函数
QimageRecordReader::initialize() iscalled.
RecordReader ---> initialize is over!!!
//调用QimageRecordReader中的initialize函数,从split获得将要处理的文件内容
QimageRecordReader::nextKeyValue() iscalled.
start to nextKeyValue function!!count--> 1
start22 to nextKeyValue function!!
//调用QimageRecordReader的nextKeyValue函数,计算key和value
Qimage::readFields() is called.
readFields's count is 1
start to readFields,size --> 983012
// QimageRecordReader::nextKeyValue()中调用Qimage::readFields()方法,将split中读取的文件内容赋值给value
RecordReader --> nextKeyValue functionexecuted once!! count-> 1
// QimageRecordReader::nextKeyValue()函数执行完毕
QimageRecordReader::getProgress() iscalled.
//返回进度
QimageRecordReader::getCurrentKey() iscalled.
//读取key
QimageRecordReader::getCurrentValue() iscalled.
//读取value
Map::map() is called.
//执行map主函数,Map函数调用上述key和value
Qimage::write() is called &&fileLength = 983012
map::key: MSingle_1_119
-------------------------
//Map函数将<key,value>写入到reduece,调用Qimage::write()函数
QimageRecordReader::nextKeyValue() iscalled.
//下一个<key,value>
QimageRecordReader::getProgress() iscalled.
//进度返回
QimageRecordReader::getCurrentKey() iscalled.
//自此map阶段全部完成之后,进行reduce
二、Reduece阶段
QimageOutputFormat::getRecordWriter()is called.
QimageRecordWriter::QimageRecordWriter()is called.
Qimage::readFields()is called.
readFields'scount is 1
startto readFields,size --> 990976
//reduce在读取map传过来的value值时,调用Qimage::readFields()
Reduce::reduce()is called.
//执行reduce主函数
QimageRecordWriter::write()is called. && k -> MSingle_0_119
//将执行结果输出
QimageRecordWriter::write()::key is not null.
TheOutput Path is hdfs://master:9000/Qimage/output/MSingle_0_119.bmp
QimageRecordWriter::write()::key is not null && value is not null.
TheOutput Path is hdfs://master:9000/Qimage/output/MSingle_0_119.bmp
Qimage::getData()is called.
data.lengthis 990976
Creatingoutput pic ... hdfs://master:9000/Qimage/output/MSingle_0_119.bmp
-------------------------
reduce::key: MSingle_0_119
-------------------------
以下都是reduce函数的重复调用,本次中一共调用3次。
Qimage::readFields()is called.
readFields'scount is 2
startto readFields,size --> 959532
Reduce::reduce()is called.
QimageRecordWriter::write()is called. && k -> MSingle_10_119
QimageRecordWriter::write()::key is not null.
TheOutput Path is hdfs://master:9000/Qimage/output/MSingle_10_119.bmp
QimageRecordWriter::write()::key is not null && value is not null.
TheOutput Path is hdfs://master:9000/Qimage/output/MSingle_10_119.bmp
Qimage::getData()is called.
data.lengthis 959532
Creatingoutput pic ... hdfs://master:9000/Qimage/output/MSingle_10_119.bmp
-------------------------
reduce::key: MSingle_10_119
-------------------------
问题总结
在QimageRecordReader::nextKeyValue()函数,我将要处理的内容从split中读取出来,存取value中遇到了这个问题。代码如下:
filein= fs.open(file);
byte data[] =newbyte[filein.available()];//创建一个字节类型的数组,用来接收整个图像文件
filein.read(data);//从hdfs读入字节流
以上写法是有问题的,filein.available()中返回的确实是整个文件的大小,但是在filein.read(data)中,读到的data却不是整个文件的,我调试中发现是data数组中有真实数据的大小是8192*16byte,数组中这以后的数字全是0,这个问题太隐蔽了。
解决:
将这个读入数据的语句改成:
filein.readFully(0, data, 0, (int) fileLength);
附上函数原型,http://hadoop.apache.org/docs/r2.2.0/api/index.html
readFully
public void readFully(long position,
byte[] buffer,
int offset,
int length)
throws IOException
Read bytesfrom the given position in the stream to the given buffer. Continues to readuntil length
bytes havebeen read.
Specified by:
readFully in interface PositionedReadable
Parameters:
position
- position inthe input stream to seek
buffer
- buffer intowhich data is read
offset
- offset intothe buffer in which data is written
length
- the numberof bytes to read
Throws:
EOFException - If the endof stream is reached while reading. If an exception is thrown an undetermined numberof bytes in the buffer may have been written.
IOException
- Hadoop的MapReduce执行过程
- Hadoop的MapReduce函数的执行的过程
- hadoop中mapreduce的执行过程
- hadoop的mapreduce过程
- Hadoop MapReduce的shuffle过程
- MapReduce的执行过程介绍
- hadoop的mapreduce任务的执行流程
- Hadoop 键值对的mapreduce过程剖析
- Hadoop 键值对的MapReduce过程剖析
- hadoop MapReduce模型的shuffle过程
- Hadoop 键值对的mapreduce过程剖析
- hadoop任务的执行过程
- 《MapReduce:详细介绍Shuffle的执行过程》
- mapreduce的执行流程以及shuffle过程
- Mapreduce(二):MR的执行过程分析
- MapReduce:详细介绍Shuffle的执行过程
- MapReduce:详细介绍Shuffle的执行过程
- MapReduce:详细介绍Shuffle的执行过程
- 领域模型方法——名词分析法
- OpenCV学习之一: 存取像素值
- C#事件(event)解析
- CentOS SSH安装和配置
- Smali语法介绍
- Hadoop的MapReduce函数的执行的过程
- SoC嵌入式软件架构设计之二:没有MMU的CPU实现虚拟内存管理的设计方法
- C#彩色图片灰度化算法介绍
- *dev=filp->private_data;这一句的理解
- hdu 1175连连看
- jQuery多级联动多选框
- 黑马程序员_HTML
- 递归与非递归实现二叉树的遍历
- linux设备模型,bus,device,driver,实验遇到的问题