MapReduce入门级之WordCount单词计数
来源:互联网 发布:大数据风险 编辑:程序博客网 时间:2024/04/28 13:00
话不多说直接贴上代码:具体的实现代码后面描述
package com.whomai.test;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount{public static class mapWordCount extends Mapper<Object,Text,Text,IntWritable>{private final static IntWritable one = new IntWritable(1);private Text word = new Text(); public void map(Object key,Text value,Context context) throws IOException, InterruptedException{ String line =value.toString(); StringTokenizer st = new StringTokenizer(line); while(st.hasMoreTokens()){ word.set(st.nextToken()); context.write(word, one); } }}public static class reduceWordCount extends Reducer<Text,IntWritable,Text,IntWritable>{public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{int sum = 0;for(IntWritable val : values){sum += val.get();}context.write(key, new IntWritable(sum));}}public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{Configuration conf = new Configuration();Job job = new Job(conf,"word");job.setJarByClass(WordCount.class); String wordCountInput = "hdfs://192.168.248.133:9000/wordCountInput"; String WordCountOut="hdfs://192.168.248.133:9000/WordCountOutPath"; job.setMapperClass(mapWordCount.class); job.setCombinerClass(reduceWordCount.class); job.setReducerClass(reduceWordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(wordCountInput)); FileOutputFormat.setOutputPath(job, new Path(WordCountOut)); System.exit(job.waitForCompletion(true) ? 0 : 1);}}
首先hadoop是基于分布式文件系统hdfs和分布式处理MapReduce的。如字面,Mapreduce包含了Map操作和Reduce操作。
Map操作就是拆分,将大量繁重的任务分散给各个节点。那么reduce就是将map的任务进行整合。
在代码中我们实现了map和reduce的内部类。Map类继承了hadoop2.0中的mapper类。里面传递了四个参数,分别是传入文件时的key,value类型和,map输出时的类型。
这样来说,
mapreduce的基本思想就是这些,各个程序相辅相成,比如说去重,其主要涵义就是在在reduce的过程中将value的值设置为1就行了。
0 0
- MapReduce入门级之WordCount单词计数
- MapReduce之WordCount单词计数
- MapReduce之WordCount单词计数(上)
- MapReduce之WordCount单词计数(下)
- MapReduce入门之wordcount
- hadoop入门(六)JavaAPI+Mapreduce实例wordCount单词计数详解
- Hadoop 实战之单词计数WordCount
- Hadoop 实战之单词计数wordcount
- 【Hadoop基础教程】Hadoop之单词计数wordcount
- 解剖WordCount单词计数
- WordCount单词计数详解
- WordCount单词计数
- Hadoop之MapReduce单词计数经典实例
- Hadoop之Mapreduce------>入门级程序WordCount原理
- Hadoop之Mapreduce------>入门级程序WordCount代码编写
- Hadoop 之 Wordcount 单词计数 (学习笔记)
- Mapreduce和HBase新版本整合之WordCount计数案例
- Spark Java 单词计数(WordCount)
- 17-chown 函数
- python 操作mysql数据库(简单/基础)
- 编程练习B(最大二部图匹配)
- 如何使用爬虫采集美团外卖商家信息
- springmvc 获取当前的请求路径(RequestMapping)
- MapReduce入门级之WordCount单词计数
- 关于android监听网路状态的代码
- KVM虚拟机迁移技术研究
- 51nod-【1136 欧拉函数】
- React 入门简介
- oj第十四周实践——求3×3矩阵对角线元素之和
- HTML5游戏开发实践之使用监控器
- Java NIO 系列教程
- gbadev上的资料搬运贴