setPartitionerClass, setOutputKeyComparatorClass and setOutputValueGroupingComparator

来源:互联网 发布:淘宝闺蜜网址 编辑:程序博客网 时间:2024/05/17 22:34

from http://autofei.wordpress.com/2012/10/18/setpartitionerclass-setoutputkeycomparatorclass-and-setoutputvaluegroupingcomparator/

setPartitionerClass, setOutputKeyComparatorClass and setOutputValueGroupingComparator

Partitioner decides which mapper output goes to which reduer based on mapper output key. In general, different key is in different group (Iterator at the reducer side). But sometimes, we want different key is in the same group. This is the time for Output Value Grouping Comparator, which  is used to group mapper output. For easy understanding, think this is the group by condition in SQL. I will give a detail example for time serial analysis later. Output Key Comparator is used during sort stage for the mapper output key.

The above looks pretty straight forward. But there is one thing to remember:  if you use setOutputValueGroupingComparator, all the key in the same group at reducer side will be same now even they are not the same at the mapper output.

You can download the example from: https://www.assembla.com/spaces/autofei_public/documents

  • record.txt is the input (three fields, year, an random number, place)
  • MaxTemperatureUsingSecondarySort.java is the main hadoop code
  • IntPair.java is the mapper output key object
  • output.txt is the output

You will notice that number for the same year is the same now, the max one.

Note: the code is modified from book “Hadoop The Tefinitive Guide”

注意,new MapReduce API中,setOutputKeyComparatorsClass 对应为setSortComparatorsClass

setOutputValueGroupingComparatorsClass对应为setGroupingComparatorClass


0 0
原创粉丝点击