Hadoop2.0 Mapreduce实例WordCount体验
来源:互联网 发布:magnet关联的软件 编辑:程序博客网 时间:2024/05/16 06:23
在Hadoop2.0中MapReduce程序的都需要继承org.apache.hadoop.mapreduce.Mapper 和 org.apache.hadoop.mapreduce.Reducer这两个基础类,来定制自己的mapreduce功能,源码中主要的函数如下
Mapper.java
public void run(Context context) throws IOException, InterruptedException { setup(context); // Called once at the beginning of the task. while (context.nextKeyValue()) { map(context.getCurrentKey(), context.getCurrentValue(), context); } cleanup(context); // Called once at the end of the task. } } /** * Called once for each key/value pair in the input split. Most applications * should override this, but the default is the identity function. */ protected void map(KEYIN key, VALUEIN value, Context context) throws IOException, InterruptedException { context.write((KEYOUT) key, (VALUEOUT) value); }Reducer.java
public void run(Context context) throws IOException, InterruptedException { setup(context); // Called once at the beginning of the task. while (context.nextKey()) { reduce(context.getCurrentKey(), context.getValues(), context); } cleanup(context); // Called once at the end of the task. } /** * This method is called once for each key. Most applications will define * their reduce class by overriding this method. The default implementation * is an identity function. */ protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context ) throws IOException, InterruptedException { for(VALUEIN value: values) { context.write((KEYOUT) key, (VALUEOUT) value); } }
在Mapper和Reducer类中,都有一个run()方法不断提供(key,value)来调用map()和reduce()函数来处理,我们一般只需重写其中的map和reduce方法。在mapreduce中只有支持序列化的类才能作为键值,其中的key还必须要是可比较的,故 key要实现WritableComparable接口,value只需要实现Writable接口。
如下给出自己参照源码写的MyWordCount.java
import java.io.IOException;import java.util.Iterator;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class MyWordCount {public static class WordCountMapper extends Mapper<Object,Text,Text,IntWritable> { private static final IntWritable one = new IntWritable(1);private Text word = new Text();protected void map(Object key, Text value, Context context)throws IOException, InterruptedException {String line = value.toString();StringTokenizer words = new StringTokenizer(line);while(words.hasMoreTokens()) {word.set(words.nextToken());context.write(word, one);}}}public static class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable> {private IntWritable totalNum = new IntWritable();@Overrideprotected void reduce(Text key, Iterable<IntWritable> values,Context context)throws IOException, InterruptedException { int sum = 0; Iterator<IntWritable> it = values.iterator(); while(it.hasNext()) { sum += it.next().get(); } totalNum.set(sum); context.write(key,totalNum);}}public static void main(String[] args) throws Exception{Configuration conf = new Configuration(); Job job = new Job(conf,"MyWordCount");job.setJarByClass(MyWordCount.class); //设置运行jar中的class名称job.setMapperClass(WordCountMapper.class);//设置mapreduce中的mapper reducer combiner类job.setReducerClass(WordCountReducer.class); job.setCombinerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); //设置输出结果键值对类型 job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPath(job,new Path(args[0]));//设置mapreduce输入输出文件路径FileOutputFormat.setOutputPath(job,new Path(args[1]));System.exit(job.waitForCompletion(true) ? 0:1);}}
- Hadoop2.0 Mapreduce实例WordCount体验
- Hadoop2.0 YARN cloudra4.4.0 WordCount实例
- Mapreduce WordCount实例
- MapReduce WordCount编程实例
- hadoop MapReduce实例解析(WordCount)
- MapReduce编程实例之WordCount
- Hadoop2.2.0 实例测试 WordCount程序
- Hadoop2.x实战:WordCount实例运行
- IntelliJ IDEA 运行Hadoop2.7.0 wordcount 实例
- hadoop2.0 wordcount代码讲解
- 配置Hadoop2.x的HDFS、MapReduce来运行WordCount程序
- Hadoop2.4.1 简单的wordCount的MapReduce程序
- WordCount——MapReduce 实例入门
- 第一次写MapReduce之WordCount实例
- 使用python实现MapReduce的wordcount实例
- Mapreduce实例---统计单词个数(wordcount)
- Mapreduce wordCount
- MapReduce WordCount
- 什么情况该继承
- C语言,锯齿数组
- Python_Notepad++搭建Python开发环境的一个小改进
- UIScrollview 内存释放问题
- 制作可保存配置的U盘版BT4(BackTrack4 )
- Hadoop2.0 Mapreduce实例WordCount体验
- <JQurey>复选框,选中行换色及鼠标滑过换色
- iphone 游戏开发 失败经验 总结
- java中的Map判断重复的方法
- 海淘nexus 7全攻略!自己淘不求人
- linux 系统留后门方法+日志清除
- AS3 内存释放优化原则
- WIRELESS MAC AND PHY SPECIFICATIONS FOR WPANS
- 命令行模式下编译ActionScript3