大数据学习记录（day5）-Hadoop之Mapper类和Reducer类代码学习

来源：互联网发布：mysql mac dmg 下载编辑：程序博客网时间：2024/06/06 04:13

学习来源：http://www.aboutyun.com/thread-5597-1-1.html
http://www.aboutyun.com/thread-5598-1-1.html
说明：由于参考资料大多是2013年以前的，所以有些说法也许并不成立，请读者选择性吸收。

       今天继续来读代码，关于Hadoop之Mapper类和Reducer类。
       一、Mapper类。
      在Hadoop的mapper类中，有4个主要的函数，分别是：setup，clearup，map，run。代码如下：

protected void setup(Context context) throws IOException, InterruptedException {// NOTHING}protected void map(KEYIN key, VALUEIN value,                      Context context) throws IOException, InterruptedException {context.write((KEYOUT) key, (VALUEOUT) value);}protected void cleanup(Context context) throws IOException, InterruptedException {// NOTHING}public void run(Context context) throws IOException, InterruptedException {    setup(context);    while (context.nextKeyValue()) {      map(context.getCurrentKey(), context.getCurrentValue(), context);    }    cleanup(context);  }}

由上面的代码，我们可以了解到，当调用到map时，通常会先执行一个setup函数，最后会执行一个cleanup函数。而默认情况下，这两个函数的内容都是nothing。因此，当map方法不符合应用要求时，可以试着通过增加setup和cleanup的内容来满足应用的需求。
二、Reducer类。

在Hadoop的reducer类中，有3个主要的函数，分别是：setup，clearup，reduce。代码如下：

  /**   * Called once at the start of the task.   */  protected void setup(Context context                       ) throws IOException, InterruptedException {    // NOTHING  }

/**   * This method is called once for each key. Most applications will define   * their reduce class by overriding this method. The default implementation   * is an identity function.   */  @SuppressWarnings("unchecked")  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context                        ) throws IOException, InterruptedException {    for(VALUEIN value: values) {      context.write((KEYOUT) key, (VALUEOUT) value);    }  }

/**   * Called once at the end of the task.   */  protected void cleanup(Context context                         ) throws IOException, InterruptedException {    // NOTHING  }

在用户的应用程序中调用到reducer时，会直接调用reducer里面的run函数，其代码如下：

/*   * control how the reduce task works.   */  @SuppressWarnings("unchecked")  public void run(Context context) throws IOException, InterruptedException {    setup(context);    while (context.nextKey()) {      reduce(context.getCurrentKey(), context.getValues(), context);      // If a back up store is used, reset it      ((ReduceContext.ValueIterator)          (context.getValues().iterator())).resetBackupStore();    }    cleanup(context);  }}

由上面的代码，我们可以了解到，当调用到reduce时，通常会先执行一个setup函数，最后会执行一个cleanup函数。而默认情况下，这两个函数的内容都是nothing。因此，当reduce不符合应用要求时，可以试着通过增加setup和cleanup的内容来满足应用的需求。
小结：
今天以阅读代码的形式学习了Hadoop中Mapper类和Reducer类的主要方法。

阅读全文

0 0