Hadoop的MapReduce函数的执行的过程

来源:互联网 发布:华为软件技术有限公司 编辑:程序博客网 时间:2024/05/16 07:33

Hadoop的MapReduce函数的执行的过程

说明

Qimage是我自定义的数据类型,作为value,所以没有用WritableComparable

QimageInputFormat<Text,Qimage>是自定义的输入格式

QimageRecordReader<Text,Qimage>是自定义的RecordReader

QimageOutputFormat<Text,Qimage>是自定义的输出格式

QimageRecordWriter<Text,Qimage>是自定义的RecordWriter

public classQimage implements Writable {

   public static int count = 0;

   private long fileLength;

   private byte[] data;

}

其中data中存放图像数据

一、Map阶段

QimageInputFormat::createRecordReader() iscalled.

//调用QimageInputFormat中的createRecordReader函数

 

 

QimageRecordReader::initialize() iscalled.

RecordReader ---> initialize is over!!!

//调用QimageRecordReader中的initialize函数,从split获得将要处理的文件内容

 

 

QimageRecordReader::nextKeyValue() iscalled.

start to nextKeyValue function!!count--> 1

start22 to nextKeyValue function!!

//调用QimageRecordReadernextKeyValue函数,计算keyvalue

 

Qimage::readFields() is called.

readFields's count is 1

start to readFields,size --> 983012

// QimageRecordReader::nextKeyValue()中调用Qimage::readFields()方法,将split中读取的文件内容赋值给value

 

RecordReader --> nextKeyValue functionexecuted once!! count-> 1

// QimageRecordReader::nextKeyValue()函数执行完毕

 

QimageRecordReader::getProgress() iscalled.

//返回进度

 

QimageRecordReader::getCurrentKey() iscalled.

//读取key

 

QimageRecordReader::getCurrentValue() iscalled.

//读取value

 

Map::map() is called.

//执行map主函数,Map函数调用上述keyvalue

 

Qimage::write() is called &&fileLength = 983012

map::key:     MSingle_1_119

-------------------------

//Map函数将<key,value>写入到reduece,调用Qimage::write()函数

 

QimageRecordReader::nextKeyValue() iscalled.

//下一个<key,value>

 

QimageRecordReader::getProgress() iscalled.

//进度返回

 

QimageRecordReader::getCurrentKey() iscalled.

//自此map阶段全部完成之后,进行reduce

二、Reduece阶段

QimageOutputFormat::getRecordWriter()is called.

 

 

QimageRecordWriter::QimageRecordWriter()is called.

 

 

Qimage::readFields()is called.

readFields'scount is 1

startto readFields,size --> 990976

//reduce在读取map传过来的value值时,调用Qimage::readFields()

 

Reduce::reduce()is called.

//执行reduce主函数

 

QimageRecordWriter::write()is called. && k -> MSingle_0_119

//将执行结果输出

 

QimageRecordWriter::write()::key is not null.

TheOutput Path is hdfs://master:9000/Qimage/output/MSingle_0_119.bmp

 

 

QimageRecordWriter::write()::key is not null && value is not null.

TheOutput Path is hdfs://master:9000/Qimage/output/MSingle_0_119.bmp

 

 

Qimage::getData()is called.

data.lengthis 990976

Creatingoutput pic ... hdfs://master:9000/Qimage/output/MSingle_0_119.bmp

-------------------------

reduce::key: MSingle_0_119

-------------------------

以下都是reduce函数的重复调用,本次中一共调用3次。

 

 

 

Qimage::readFields()is called.

readFields'scount is 2

startto readFields,size --> 959532

 

 

Reduce::reduce()is called.

 

 

QimageRecordWriter::write()is called. && k -> MSingle_10_119

 

 

QimageRecordWriter::write()::key is not null.

TheOutput Path is hdfs://master:9000/Qimage/output/MSingle_10_119.bmp

 

 

QimageRecordWriter::write()::key is not null && value is not null.

TheOutput Path is hdfs://master:9000/Qimage/output/MSingle_10_119.bmp

 

 

Qimage::getData()is called.

data.lengthis 959532

Creatingoutput pic ... hdfs://master:9000/Qimage/output/MSingle_10_119.bmp

-------------------------

reduce::key: MSingle_10_119

-------------------------

问题总结

在QimageRecordReader::nextKeyValue()函数,我将要处理的内容从split中读取出来,存取value中遇到了这个问题。代码如下:

 

filein= fs.open(file);

byte data[] =newbyte[filein.available()];//创建一个字节类型的数组,用来接收整个图像文件

filein.read(data);//hdfs读入字节流

 

以上写法是有问题的,filein.available()中返回的确实是整个文件的大小,但是在filein.read(data)中,读到的data却不是整个文件的,我调试中发现是data数组中有真实数据的大小是8192*16byte,数组中这以后的数字全是0,这个问题太隐蔽了。

 

解决:

将这个读入数据的语句改成:

filein.readFully(0, data, 0, (int) fileLength);

附上函数原型,http://hadoop.apache.org/docs/r2.2.0/api/index.html

readFully

public void readFully(long position,
                      byte[] buffer,
                      int offset,
                      int length)
               throws IOException

Read bytesfrom the given position in the stream to the given buffer. Continues to readuntil length bytes havebeen read.

Specified by:

readFully in interface PositionedReadable

Parameters:

position - position inthe input stream to seek

buffer - buffer intowhich data is read

offset - offset intothe buffer in which data is written

length - the numberof bytes to read

Throws:

EOFException - If the endof stream is reached while reading. If an exception is thrown an undetermined numberof bytes in the buffer may have been written.

IOException

0 0
原创粉丝点击