hadoop wordcount
来源:互联网 发布:c语言打印倒直角三角形 编辑:程序博客网 时间:2024/05/16 20:30
在hdfs上的文本内容如下
hello world hello javahello chello hadoop map reduce
以下是自己对这个过程的总结
mapreduce执行的流程 input<k1,v1> -> map -><k2,v2> -> <k2,list<v2>> ->reduce<k3,v3> ->output 具体的步骤: map输入 <0,hello world hello java> <22, hello c> <29, hello hadoop map reduce> map输出 <hello,1> <world,1> <hello,1> <java, 1> <hello,1 > <c, 1> <hello, 1> <hadoop , 1> <map, 1> <reduce, 1> 进行shuffle 对map的输出进行排序分组 shuffle处理后 <hello, <1,1,1,1>> <world, <1>> <java, <1>> <c, <1>>... reduce 接受shuffle后的kv,遍历v的列表,进行求和 结果: <hello, 4> <world,1>... output到hdfs
代码
import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount { /** * * map: * 将一行句子, 以空格切分 进行输出 <一个单词, 1> * @author hadoop * */ public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { //String[] strs = value.toString().split(""); StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } /** * reduce功能 * 接受map的数据(中间有shuffle过程) * 计数 * @author hadoop * */ public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); //设置主要的工作类 job.setJarByClass(WordCount.class); //设置输入输出路径 FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); //设置map和reduce类 job.setMapperClass(TokenizerMapper.class); job.setReducerClass(IntSumReducer.class); //设置输出k, v 格式 job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); //job.setCombinerClass(IntSumReducer.class); //运行任务 System.exit(job.waitForCompletion(true) ? 0 : 1); }}
阅读全文
0 0
- hadoop wordcount
- hadoop wordcount
- hadoop-wordcount
- Hadoop WordCount
- hadoop-wordcount
- hadoop wordcount
- <hadoop>hadoop wordcount
- hadoop 运行 wordcount
- hadoop wordcount运行实例
- hadoop wordcount源代码分析
- hadoop wordcount 相关资料
- Hadoop WordCount解读
- Hadoop WordCount解读
- pig-配置(hadoop)-wordCount
- 黑马程序员--Hadoop Wordcount
- Hadoop之WordCount
- hadoop wordcount深入
- Hadoop WordCount 运行
- STL容器使用案例文档
- 前端懵逼脸~~日期函数
- Python中os和shutil模块实用方法集锦
- 面试_技术问题_典型题
- Python一些模块
- hadoop wordcount
- log4j 配置文件详细描述
- LeetCode Median of Two Sorted Arrays
- 一个仅有1KB大小的Docker容器
- Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches
- 2017第六届中国(四川)春季国际茶业博览会会刊(参展商名录)
- GBDT梯度提升树原理剖析
- SpringMVC在tomcat中的执行原理和过程
- 笔记:人脸识别概述