hadoop mapreduce包含阶段介绍

来源:互联网 发布:4fang软件论坛 编辑:程序博客网 时间:2024/04/29 02:36
1 hadoop map reduce阶段介绍
    1)mapper:maps input key/value pairs to a set of intermediate key/value pairs
2 reducer:reduces a set of intermediate values while share a key to a smaller set of values
    1)shuffle(洗牌):input to the reducer is the sorted output of the mappers.In this phase the framework fetches the relevant(相关的) partition of the output of all the mappers,via http.
    2)sort:the framework groups reducer inputs by keys(since different mappers may have output the same key)in this stage.
 the shuffle and sort phases occur simultaneous;while map-ouputs are being fetched they are merged.
  3)secondary sort:If equivalence rules for grouping the intermediate keys are required to be different from those for grouping keys before reduction, then one may specify a Comparator via JobConf.setOutputValueGroupingComparator(Class). Since JobConf.setOutputKeyComparatorClass(Class) can be used to control how intermediate keys are grouped, these can be used in conjunction to simulate secondary sort on values.
  4)reduce:in this phase the reduce method is called for each<key,(list of values)> pair in the grouped inputs.
0 0