hadoop mapreduce包含阶段介绍

来源：互联网发布：4fang软件论坛编辑：程序博客网时间：2024/04/29 02:36

1 hadoop map reduce阶段介绍
1）mapper：maps input key/value pairs to a set of intermediate key/value pairs
2 reducer：reduces a set of intermediate values while share a key to a smaller set of values
1）shuffle（洗牌）：input to the reducer is the sorted output of the mappers.In this phase the framework fetches the relevant（相关的） partition of the output of all the mappers,via http.
2）sort：the framework groups reducer inputs by keys(since different mappers may have output the same key)in this stage.
the shuffle and sort phases occur simultaneous;while map-ouputs are being fetched they are merged.
3）secondary sort：If equivalence rules for grouping the intermediate keys are required to be different from those for grouping keys before reduction, then one may specify a Comparator via JobConf.setOutputValueGroupingComparator(Class). Since JobConf.setOutputKeyComparatorClass(Class) can be used to control how intermediate keys are grouped, these can be used in conjunction to simulate secondary sort on values.
4）reduce：in this phase the reduce method is called for each<key,(list of values)> pair in the grouped inputs.

0 0