Hadoop中的Word Count例子到底是怎么回事？

来源：互联网发布：windows xp更新包编辑：程序博客网时间：2024/04/28 07:24

1. 不管是国外还是国内，一提到Hadoop的入门程序，第一个一定会是WordCount，就是给你一堆文件，让你计算出文件中出现的单词的个数；

2. 他们一般给出的程序如下：

Map：

import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.Writable;import org.apache.hadoop.io.WritableComparable;import org.apache.hadoop.mapred.MapReduceBase;import org.apache.hadoop.mapred.Mapper;import org.apache.hadoop.mapred.OutputCollector;import org.apache.hadoop.mapred.Reporter;public class WordCountMapper extends MapReduceBase    implements Mapper<LongWritable, Text, Text, IntWritable> {  private final IntWritable one = new IntWritable(1);  private Text word = new Text();  public void map(WritableComparable key, Writable value,      OutputCollector output, Reporter reporter) throws IOException {    String line = value.toString();    StringTokenizer itr = new StringTokenizer(line.toLowerCase());    while(itr.hasMoreTokens()) {      word.set(itr.nextToken());      output.collect(word, one);    }  }}

Reduce：

import java.io.IOException;import java.util.Iterator;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.WritableComparable;import org.apache.hadoop.mapred.MapReduceBase;import org.apache.hadoop.mapred.OutputCollector;import org.apache.hadoop.mapred.Reducer;import org.apache.hadoop.mapred.Reporter;public class WordCountReducer extends MapReduceBase    implements Reducer<Text, IntWritable, Text, IntWritable> {  public void reduce(Text key, Iterator values,      OutputCollector output, Reporter reporter) throws IOException {    int sum = 0;    while (values.hasNext()) {      IntWritable value = (IntWritable) values.next();      sum += value.get(); // process value    }    output.collect(key, new IntWritable(sum));  }}

3. 但是没有解释为什么在Map中使用的是值为1的IntWritable，我google了一段时间，找到下面两个图，应该可以很好的解释了这个问题。

The MapReduce workflow

The word count flow:

refs:

http://blog.gopivotal.com/pivotal/products/hadoop-101-programming-mapreduce-with-native-libraries-hive-pig-and-cascading

http://kickstarthadoop.blogspot.de/2011/04/word-count-hadoop-map-reduce-example.html

https://developer.yahoo.com/hadoop/tutorial/

0 0