一步一步学习hadoop（九）

来源：互联网发布：python初学者看什么书编辑：程序博客网时间：2024/04/25 21:00

Reducer的实现

map任务读取数据，解析数据，按照键值将数据分成一组一组的，reduce任务收集map任务的输出，通过合并、排序和归约三个过程对map的输出数据进行进一步的处理。现在我们只关心归约过程即reduce函数的实现。

实际上我们不用重新去实现，只需继承Hadoop提供的Mapper类即可，Mapper类的几个主要函数如下：

protected void setup(Context context                       ) throws IOException, InterruptedException {    //添加自己的初始化程序，比如读取作业的配置，自定义参数，读取DistrubteCache等  }  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context                        ) throws IOException, InterruptedException {    //reduce主要业务流，下面的是默认实现，即老版本的IdentityReduce，数据原样输出。通过覆写，实现自己的业务流程    for(VALUEIN value: values) {      context.write((KEYOUT) key, (VALUEOUT) value);    }  }  protected void cleanup(Context context                         ) throws IOException, InterruptedException {    // 所有的清理操作  }

下面还是以一个例子作为结束，该类实现了liunx的cut工具的功能的Reducer，和上一节中的FieldSelectionMapper刚好是一套，所有的设置也是类似的，
输出的key/value对，以mapreduce.fieldsel.reduce.output.key.value.fields.spec来指定，格式和FieldSelectionMapper一样

public class FieldSelectionReducer<K, V>    extends Reducer<Text, Text, Text, Text> {  private String fieldSeparator = "\t";  private String reduceOutputKeyValueSpec;  private List<Integer> reduceOutputKeyFieldList = new ArrayList<Integer>();  private List<Integer> reduceOutputValueFieldList = new ArrayList<Integer>();  private int allReduceValueFieldsFrom = -1;  public static final Log LOG = LogFactory.getLog("FieldSelectionMapReduce");  public void setup(Context context)      throws IOException, InterruptedException {    Configuration conf = context.getConfiguration();    this.fieldSeparator =      conf.get(FieldSelectionHelper.DATA_FIELD_SEPERATOR, "\t");        this.reduceOutputKeyValueSpec =      conf.get(FieldSelectionHelper.REDUCE_OUTPUT_KEY_VALUE_SPEC, "0-:");        allReduceValueFieldsFrom = FieldSelectionHelper.parseOutputKeyValueSpec(      reduceOutputKeyValueSpec, reduceOutputKeyFieldList,      reduceOutputValueFieldList);  }  public void reduce(Text key, Iterable<Text> values, Context context)      throws IOException, InterruptedException {    String keyStr = key.toString() + this.fieldSeparator;        for (Text val : values) {      FieldSelectionHelper helper = new FieldSelectionHelper();      helper.extractOutputKeyValue(keyStr, val.toString(),        fieldSeparator, reduceOutputKeyFieldList,        reduceOutputValueFieldList, allReduceValueFieldsFrom, false, false);      context.write(helper.getKey(), helper.getValue());    }  }}