HDPCD-Java-复习笔记(4)
来源:互联网 发布:win7删除windows.old 编辑:程序博客网 时间:2024/06/07 04:54
Map Aggregation
Aggregation
The term refers to a Mapper combining its <key, value> pairs, with the goal of reducing the amount of network traffic between the Mapper and the Reducer.
There are two ways to perform Map Aggregation in Hadoop:
Combiners --- The MapReduce framework has the concept of a Combiner, where you write a class that defines the aggregation, and the framework decides when to perform the aggregation.
In-map Aggregation --- The Mapper contains logic that aggregates records, typically accomplished by buffering records in memory prior to writing them out.
Overview of Combiners
The < key ,value > records output by the Mapper are serialized, so the Combiner has to deserialize them.
A Combiner only aggregates data on one node. It does not combine the output of multiple Mappers.
Reduce-side Combining
The Combiner is also used in the reduce phase if the intermediate <key,value> pairs from Mappers are spilled to disk.
The fact that the Reducer uses the Combiner behind-the-scenes to improve file I/O.
Counters
The pre-defined counters include usefulinformation,like the number of map input records, or the amount of byteswritten to HDFS.
The Hadoop counters are global -they are asummation of events that occurs across the entire cluster.
User-defined Counters
Two ways to define your own counter in Hadoop:
1.Use an enum to define a group,and the elements in the enum become the counter names.
2.Use strings for the group name and counter name.
Combiner Example
public class WordCountCombiner extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable outputValue = new IntWritable(); @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for(IntWritable count : values) { sum += count.get(); } outputValue.set(sum); context.write(key, outputValue); }}
In-Map Aggregation
In-Map Aggregation Example
public class TopResultsMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private ArrayList<Word> words = new ArrayList<Word>(); private PriorityQueue<Word> queue; private int maxResults; @Override protected void setup(Context context) throws IOException, InterruptedException { maxResults = Integer.parseInt(context.getConfiguration() .get("maxResults")); } @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] input = StringUtils.split(value.toString(), '\\', ' '); for (String word : input) { Word currentWord = new Word(word, 1); if (words.contains(currentWord)) { //increment the existing Word's frequency for (Word w : words) { if (w.equals(currentWord)) { w.frequency++; break; } } } else { words.add(currentWord); } } } @Override protected void cleanup(Context context) throws IOException, InterruptedException { Text outputKey = new Text(); IntWritable outputValue = new IntWritable(); queue = new PriorityQueue<Word>(words.size()); queue.addAll(words); for (int i = 1; i <= maxResults; i++) { Word tail = queue.poll(); if (tail != null) { outputKey.set(tail.value); outputValue.set(tail.frequency); context.write(outputKey, outputValue); } } }}
public class Word implements Comparable<Word> { public String value; public int frequency; public Word(String value, int frequency) { this.value = value; this.frequency = frequency; } @Override public boolean equals(Object obj) { if (obj instanceof Word) { return value.equalsIgnoreCase(((Word) obj).value); } else { return false; } } @Override public int compareTo(Word w) { return w.frequency - this.frequency; }}
public enum MyCounters { GOOD_RECORDS, BAD_RECORDS}context.getCounter(MyCounters.GOOD_RECORDS).increment(1);
- HDPCD-Java-复习笔记(4)
- HDPCD-Java-复习笔记(1)
- HDPCD-Java-复习笔记(2)
- HDPCD-Java-复习笔记(3)-lab
- HDPCD-Java-复习笔记(5)
- HDPCD-Java-复习笔记(6)
- HDPCD-Java-复习笔记(7)- lab
- HDPCD-Java-复习笔记(8)- lab
- HDPCD-Java-复习笔记(9)-lab
- HDPCD-Java-复习笔记(10)-lab
- HDPCD-Java-复习笔记(11)
- HDPCD-Java-复习笔记(12)
- HDPCD-Java-复习笔记(13)- lab
- HDPCD-Java-复习笔记(14)- lab
- HDPCD-Java-复习笔记(15)
- HDPCD-Java-复习笔记(16)
- HDPCD-Java-复习笔记(17)
- HDPCD-Java-复习笔记(18)
- 创建SVN仓库的步骤
- kindeditor图片上传
- Windows核心编程--线程调度/优先级/关联性
- HDU
- Python 文件读写
- HDPCD-Java-复习笔记(4)
- CentOS 7 yum nginx MySQL PHP 简易环境搭建
- vin码(车架号)识别开发编译原理
- RStudio中设置当前源文件目录作为工作目录
- RSA密钥生成、加密解密、数据签名验签
- 对于实例化对象 jvm 发生的那点事
- 051: 单调性与极值及凹凸性之型一极值点判断;型二不等式证明
- 【BZOJ1996】【HNOI2006】公路修建问题
- uva437 DAG最长路