Hadoop: the definitive guide 第三版 拾遗 第五章 之MRUnit

来源:互联网 发布:2016淘宝c店销售前十名 编辑:程序博客网 时间:2024/05/16 17:34

在指南第三版中直接用MRUnit来做单元测试。

MRUnit是由Couldera公司开发的专门针对 Hadoop中编写MapReduce单元测试的框架,基本原理是JUnit4和 EasyMock。MR就是Map和Reduce的缩写。MRUnit框架非常精简,其核心的单元测试依赖于JUnit。而且MRUnit实现了一套 Mock对象来控制OutputCollector的操作,从而可以拦截OutputCollector的输出,和我们的期望结果进行比较,达到自动断言 的目的。

有了MRUnit,对MR程序做重构的时候,只要明确输入和输出,就可以写出单元测试,并且在放到群集校验前进行试验,从而节省时间和资源,也 能更快的定位到问题。而进行重构的话,只要写得足够详细的单元测试都是绿色的话,那么基本就可以保证在群集运行的结果也是正常的。

MRUnit包含四种 Driver:MapDriver,ReduceDriver,MapReduceDriver,PipelineMapReduceDriver。可以 根据自己的需要选择合适的Driver。

示例一:Writing MRUnit test cases

下面的示例将用MRUnit去单元测试一个SMS CDR(呼叫明细记录)分析的Map Reduce程序。
记录示例如下:

    CDRID; CDRType; Phone1; Phone2; SMS Status Code
    655209;1;796764372490213;804422938115889;6
    353415;0;356857119806206;287572231184798;4
    835699;1;252280313968413;889717902341635;0

此MapReduce程序分析这些记录,找到所有记录中CDRType为1的,并记录相对应的SMS Status Code,例如
这个Mapper输出为:
6, 1
0, 1
Reducer任务将此输出作为输入,并输出CDR记录中包含的status code具体次数。

下面分别给出代码:

package com.tht.MRUnitTest;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;public class SMSCDRMapper extends Mapper<LongWritable, Text, Text, IntWritable> {   private Text status = new Text();  private final static IntWritable addOne = new IntWritable(1);   /**   * Returns the SMS status code and its count   */  protected void map(LongWritable key, Text value, Context context)      throws java.io.IOException, InterruptedException {     //655209;1;796764372490213;804422938115889;6 is the Sample record format    String[] line = value.toString().split(";");    // If record is of SMS CDR    if (Integer.parseInt(line[1]) == 1) {      status.set(line[4]);      context.write(status, addOne);    }  }}

package com.tht.MRUnitTest;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;public class SMSCDRReducer extendsReducer<Text, IntWritable, Text, IntWritable> {protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws java.io.IOException, InterruptedException {int sum = 0;for (IntWritable value : values) {sum += value.get();}context.write(key, new IntWritable(sum));}}

package com.tht.MRUnitTest;import java.io.IOException;import java.util.ArrayList;import java.util.List;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mrunit.mapreduce.MapDriver;import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;import org.junit.Before;import org.junit.Test;public class SMSCDRMapperReducerTest {MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver;@Beforepublic void setUp() {SMSCDRMapper mapper = new SMSCDRMapper();SMSCDRReducer reducer = new SMSCDRReducer();mapDriver = MapDriver.newMapDriver(mapper);;reduceDriver = ReduceDriver.newReduceDriver(reducer);mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);}@Testpublic void testMapper() throws IOException {mapDriver.withInput(new LongWritable(), new Text("655209;1;796764372490213;804422938115889;6"));mapDriver.withOutput(new Text("6"), new IntWritable(1));mapDriver.runTest();}@Testpublic void testReducer() throws IOException {List<IntWritable> values = new ArrayList<IntWritable>();values.add(new IntWritable(1));values.add(new IntWritable(1));reduceDriver.withInput(new Text("6"), values);reduceDriver.withOutput(new Text("6"), new IntWritable(2));reduceDriver.runTest();}}

示例二:测试计数器

自创建的计数器的一个常见用途是跟踪输入格式错误的记录。例如,当输入CDR记录不是SMS类型,Mapper可以忽略该记录并计数增加。

修改后的mapper:

public class SMSCDRMapper extends Mapper<LongWritable, Text, Text, IntWritable> {   private Text status = new Text();  private final static IntWritable addOne = new IntWritable(1);   static enum CDRCounter {    NonSMSCDR;  };   /**   * Returns the SMS status code and its count   */  protected void map(LongWritable key, Text value, Context context) throws java.io.IOException, InterruptedException {     String[] line = value.toString().split(";");    // If record is of SMS CDR    if (Integer.parseInt(line[1]) == 1) {      status.set(line[4]);      context.write(status, addOne);    } else {// CDR record is not of type SMS so increment the counter      context.getCounter(CDRCounter.NonSMSCDR).increment(1);    }  }}

修改后的testMapper()方法:
public void testMapper() {    mapDriver.withInput(new LongWritable(), new Text(        "655209;0;796764372490213;804422938115889;6"));    //mapDriver.withOutput(new Text("6"), new IntWritable(1));    mapDriver.runTest();      assertEquals("Expected 1 counter increment", 1, mapDriver.getCounters()              .findCounter(CDRCounter.NonSMSCDR).getValue());  }




原创粉丝点击