单词计数 (Map Reduce版本)

来源:互联网 发布:淘宝网店的运营流程图 编辑:程序博客网 时间:2024/06/07 15:00

描述:

使用map reduce来计算单词频率

https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Example%3A+WordCount+v1.0

样例:

chunk1: "Google Bye GoodBye Hadoop code"chunk2: "lintcode code Bye"Get MapReduce result:    Bye: 2    GoodBye: 1    Google: 1    Hadoop: 1    code: 2    lintcode: 1
分析:

程序很简单,主要是对map reduce的理解,map对数据进行处理,建立键值对,reduce对map传输过来的数据进行处理,并返回结果

/** * Definition of OutputCollector: * class OutputCollector<K, V> { *     public void collect(K key, V value); *         // Adds a key/value pair to the output buffer * } */public class WordCount {    public static class Map {        public void map(String key, String value, OutputCollector<String, Integer> output) {            // Write your code here            // Output the results into output buffer.            // Ps. output.collect(String key, int value);            String[] result = value.split(" ");            for(int i = 0;i<result.length;i++){                output.collect(result[i] , 1);            }        }    }    public static class Reduce {        public void reduce(String key, Iterator<Integer> values,                           OutputCollector<String, Integer> output) {            // Write your code here            // Output the results into output buffer.            // Ps. output.collect(String key, int value);            int count = 0;            while(values.hasNext()){                count += values.next();            }            output.collect(key , count);        }    }}





0 0
原创粉丝点击