一步一步学习hadoop(九)

来源:互联网 发布:python初学者看什么书 编辑:程序博客网 时间:2024/04/25 21:00

  Reducer的实现

    map任务读取数据,解析数据,按照键值将数据分成一组一组的,reduce任务收集map任务的输出,通过合并、排序和归约三个过程对map的输出数据进行进一步的处理。现在我们只关心归约过程即reduce函数的实现。

    实际上我们不用重新去实现,只需继承Hadoop提供的Mapper类即可,Mapper类的几个主要函数如下:
 
protected void setup(Context context                       ) throws IOException, InterruptedException {    //添加自己的初始化程序,比如读取作业的配置,自定义参数,读取DistrubteCache等  }  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context                        ) throws IOException, InterruptedException {    //reduce主要业务流,下面的是默认实现,即老版本的IdentityReduce,数据原样输出。通过覆写,实现自己的业务流程    for(VALUEIN value: values) {      context.write((KEYOUT) key, (VALUEOUT) value);    }  }  protected void cleanup(Context context                         ) throws IOException, InterruptedException {    // 所有的清理操作  }



下面还是以一个例子作为结束,该类实现了liunx的cut工具的功能的Reducer,和上一节中的FieldSelectionMapper刚好是一套,所有的设置也是类似的,
输出的key/value对,以mapreduce.fieldsel.reduce.output.key.value.fields.spec来指定,格式和FieldSelectionMapper一样

public class FieldSelectionReducer<K, V>    extends Reducer<Text, Text, Text, Text> {  private String fieldSeparator = "\t";  private String reduceOutputKeyValueSpec;  private List<Integer> reduceOutputKeyFieldList = new ArrayList<Integer>();  private List<Integer> reduceOutputValueFieldList = new ArrayList<Integer>();  private int allReduceValueFieldsFrom = -1;  public static final Log LOG = LogFactory.getLog("FieldSelectionMapReduce");  public void setup(Context context)      throws IOException, InterruptedException {    Configuration conf = context.getConfiguration();    this.fieldSeparator =      conf.get(FieldSelectionHelper.DATA_FIELD_SEPERATOR, "\t");        this.reduceOutputKeyValueSpec =      conf.get(FieldSelectionHelper.REDUCE_OUTPUT_KEY_VALUE_SPEC, "0-:");        allReduceValueFieldsFrom = FieldSelectionHelper.parseOutputKeyValueSpec(      reduceOutputKeyValueSpec, reduceOutputKeyFieldList,      reduceOutputValueFieldList);  }  public void reduce(Text key, Iterable<Text> values, Context context)      throws IOException, InterruptedException {    String keyStr = key.toString() + this.fieldSeparator;        for (Text val : values) {      FieldSelectionHelper helper = new FieldSelectionHelper();      helper.extractOutputKeyValue(keyStr, val.toString(),        fieldSeparator, reduceOutputKeyFieldList,        reduceOutputValueFieldList, allReduceValueFieldsFrom, false, false);      context.write(helper.getKey(), helper.getValue());    }  }}


原创粉丝点击