HDPCD-Java-复习笔记(4)

来源:互联网 发布:win7删除windows.old 编辑:程序博客网 时间:2024/06/07 04:54

Map Aggregation


Aggregation

The term refers to a Mapper combining its <key, value> pairs, with the goal of reducing the amount of network traffic between the Mapper and the Reducer.

There are two ways to perform Map Aggregation in Hadoop:

Combiners --- The MapReduce framework has the concept of a Combiner, where you write a class that defines the aggregation, and the framework decides when to perform the aggregation.

In-map Aggregation --- The Mapper contains logic that aggregates records, typically accomplished by buffering records in memory prior to writing them out.


Overview of Combiners


The < key ,value > records output by the Mapper are serialized, so the Combiner has to deserialize them.

A Combiner only aggregates data on one node. It does not combine the output of multiple Mappers.


Reduce-side Combining


The Combiner is also used in the reduce phase if the intermediate <key,value> pairs from Mappers are spilled to disk. 

The fact that the Reducer uses the Combiner behind-the-scenes to improve file I/O.


Counters

The pre-defined counters include usefulinformation,like the number of map input records, or the amount of byteswritten to HDFS.

The Hadoop counters are global -they are asummation of events that occurs across the entire cluster.

 

User-defined Counters

Two ways to define your own counter in Hadoop:

1.Use an enum to define a group,and the elements in the enum become the counter names.

2.Use strings for the group name and counter name.



Combiner Example


public class WordCountCombiner   extends Reducer<Text, IntWritable, Text, IntWritable> {    private IntWritable outputValue = new IntWritable();    @Override    protected void reduce(Text key,             Iterable<IntWritable> values, Context context)              throws IOException, InterruptedException {        int sum = 0;        for(IntWritable count : values) {            sum += count.get();        }        outputValue.set(sum);        context.write(key, outputValue);    }}


In-Map Aggregation



In-Map Aggregation Example

    public class TopResultsMapper extends Mapper<LongWritable, Text, Text, IntWritable> {    private ArrayList<Word> words = new ArrayList<Word>();    private PriorityQueue<Word> queue;    private int maxResults;    @Override    protected void setup(Context context)        throws IOException, InterruptedException {        maxResults = Integer.parseInt(context.getConfiguration()                                             .get("maxResults"));    }    @Override    protected void map(LongWritable key, Text value, Context context)        throws IOException, InterruptedException {        String[] input = StringUtils.split(value.toString(), '\\', ' ');        for (String word : input) {            Word currentWord = new Word(word, 1);            if (words.contains(currentWord)) {                //increment the existing Word's frequency                for (Word w : words) {                    if (w.equals(currentWord)) {                        w.frequency++;                        break;                    }                }            } else {                words.add(currentWord);            }        }    }    @Override    protected void cleanup(Context context)        throws IOException, InterruptedException {        Text outputKey = new Text();        IntWritable outputValue = new IntWritable();        queue = new PriorityQueue<Word>(words.size());        queue.addAll(words);        for (int i = 1; i <= maxResults; i++) {            Word tail = queue.poll();            if (tail != null) {                outputKey.set(tail.value);                outputValue.set(tail.frequency);                context.write(outputKey, outputValue);            }        }    }}

public class Word implements Comparable<Word> {    public String value;    public int frequency;    public Word(String value, int frequency) {        this.value = value;        this.frequency = frequency;    }    @Override    public boolean equals(Object obj) {        if (obj instanceof Word) {            return value.equalsIgnoreCase(((Word) obj).value);        } else {            return false;        }    }    @Override    public int compareTo(Word w) {        return w.frequency - this.frequency;    }}


public enum MyCounters {     GOOD_RECORDS, BAD_RECORDS}context.getCounter(MyCounters.GOOD_RECORDS).increment(1);
 
原创粉丝点击