Hadoop的MapReduce函数的执行的过程

来源：互联网发布：华为软件技术有限公司编辑：程序博客网时间：2024/05/16 07:33

Hadoop的MapReduce函数的执行的过程

说明

Qimage是我自定义的数据类型，作为value，所以没有用WritableComparable

QimageInputFormat<Text,Qimage>是自定义的输入格式

QimageRecordReader<Text,Qimage>是自定义的RecordReader

QimageOutputFormat<Text,Qimage>是自定义的输出格式

QimageRecordWriter<Text,Qimage>是自定义的RecordWriter

public classQimage implements Writable {

public static int count = 0;

private long fileLength;

private byte[] data;

}

其中data中存放图像数据

一、Map阶段

QimageInputFormat::createRecordReader() iscalled.

//调用QimageInputFormat中的createRecordReader函数

QimageRecordReader::initialize() iscalled.

RecordReader ---> initialize is over!!!

//调用QimageRecordReader中的initialize函数，从split获得将要处理的文件内容

QimageRecordReader::nextKeyValue() iscalled.

start to nextKeyValue function!!count--> 1

start22 to nextKeyValue function!!

//调用QimageRecordReader的nextKeyValue函数，计算key和value

Qimage::readFields() is called.

readFields's count is 1

start to readFields,size --> 983012

// QimageRecordReader::nextKeyValue()中调用Qimage::readFields()方法，将split中读取的文件内容赋值给value

RecordReader --> nextKeyValue functionexecuted once!! count-> 1

// QimageRecordReader::nextKeyValue()函数执行完毕

QimageRecordReader::getProgress() iscalled.

//返回进度

QimageRecordReader::getCurrentKey() iscalled.

//读取key

QimageRecordReader::getCurrentValue() iscalled.

//读取value

Map::map() is called.

//执行map主函数，Map函数调用上述key和value

Qimage::write() is called &&fileLength = 983012

map::key: MSingle_1_119

-------------------------

//Map函数将<key,value>写入到reduece，调用Qimage::write()函数

QimageRecordReader::nextKeyValue() iscalled.

//下一个<key,value>

QimageRecordReader::getProgress() iscalled.

//进度返回

QimageRecordReader::getCurrentKey() iscalled.

//自此map阶段全部完成之后，进行reduce

二、Reduece阶段

QimageOutputFormat::getRecordWriter()is called.

QimageRecordWriter::QimageRecordWriter()is called.

Qimage::readFields()is called.

readFields'scount is 1

startto readFields,size --> 990976

//reduce在读取map传过来的value值时，调用Qimage::readFields()

Reduce::reduce()is called.

//执行reduce主函数

QimageRecordWriter::write()is called. && k -> MSingle_0_119

//将执行结果输出

QimageRecordWriter::write()::key is not null.

TheOutput Path is hdfs://master:9000/Qimage/output/MSingle_0_119.bmp

QimageRecordWriter::write()::key is not null && value is not null.

TheOutput Path is hdfs://master:9000/Qimage/output/MSingle_0_119.bmp

Qimage::getData()is called.

data.lengthis 990976

Creatingoutput pic ... hdfs://master:9000/Qimage/output/MSingle_0_119.bmp

-------------------------

reduce::key: MSingle_0_119

-------------------------

以下都是reduce函数的重复调用，本次中一共调用3次。

Qimage::readFields()is called.

readFields'scount is 2

startto readFields,size --> 959532

Reduce::reduce()is called.

QimageRecordWriter::write()is called. && k -> MSingle_10_119

QimageRecordWriter::write()::key is not null.

TheOutput Path is hdfs://master:9000/Qimage/output/MSingle_10_119.bmp

QimageRecordWriter::write()::key is not null && value is not null.

TheOutput Path is hdfs://master:9000/Qimage/output/MSingle_10_119.bmp

Qimage::getData()is called.

data.lengthis 959532

Creatingoutput pic ... hdfs://master:9000/Qimage/output/MSingle_10_119.bmp

-------------------------

reduce::key: MSingle_10_119

-------------------------

问题总结

在QimageRecordReader::nextKeyValue()函数，我将要处理的内容从split中读取出来，存取value中遇到了这个问题。代码如下：

filein= fs.open(file);

byte data[] =newbyte[filein.available()];//创建一个字节类型的数组，用来接收整个图像文件

filein.read(data);//从hdfs读入字节流

以上写法是有问题的，filein.available()中返回的确实是整个文件的大小，但是在filein.read(data)中，读到的data却不是整个文件的，我调试中发现是data数组中有真实数据的大小是8192*16byte，数组中这以后的数字全是0，这个问题太隐蔽了。

解决：

将这个读入数据的语句改成：

filein.readFully(0, data, 0, (int) fileLength);

附上函数原型，http://hadoop.apache.org/docs/r2.2.0/api/index.html

readFully

public void readFully(long position,

                      byte[] buffer,

                      int offset,

                      int length)

               throws IOException

Read bytesfrom the given position in the stream to the given buffer. Continues to readuntil length bytes havebeen read.

Specified by:

readFully in interface PositionedReadable

Parameters:

position - position inthe input stream to seek

buffer - buffer intowhich data is read

offset - offset intothe buffer in which data is written

length - the numberof bytes to read

Throws:

EOFException - If the endof stream is reached while reading. If an exception is thrown an undetermined numberof bytes in the buffer may have been written.

IOException

0 0