hadoop一些基本知识——Hadoop reducer类的阅读

来源:互联网 发布:新浪php面试题及答案 编辑:程序博客网 时间:2024/06/07 13:29

在Hadoop的reducer类中,有3个主要的函数,分别是:setup,clearup,reduce。代码如下:

  /**   * Called once at the start of the task.   */  protected void setup(Context context                       ) throws IOException, InterruptedException {    // NOTHING  }
  /**   * This method is called once for each key. Most applications will define   * their reduce class by overriding this method. The default implementation   * is an identity function.   */  @SuppressWarnings("unchecked")  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context                        ) throws IOException, InterruptedException {    for(VALUEIN value: values) {      context.write((KEYOUT) key, (VALUEOUT) value);    }  }
  /**   * Called once at the end of the task.   */  protected void cleanup(Context context                         ) throws IOException, InterruptedException {    // NOTHING  }

在用户的应用程序中调用到reducer时,会直接调用reducer里面的run函数,其代码如下:

/*   * control how the reduce task works.   */  @SuppressWarnings("unchecked")  public void run(Context context) throws IOException, InterruptedException {    setup(context);    while (context.nextKey()) {      reduce(context.getCurrentKey(), context.getValues(), context);      // If a back up store is used, reset it      ((ReduceContext.ValueIterator)          (context.getValues().iterator())).resetBackupStore();    }    cleanup(context);  }}

由上面的代码,我们可以了解到,当调用到reduce时,通常会先执行一个setup函数,最后会执行一个cleanup函数。而默认情况下,这两个函数的内容都是nothing。因此,当reduce不符合应用要求时,可以试着通过增加setup和cleanup的内容来满足应用的需求。

0 0
原创粉丝点击