MR案例之WordCount
来源:互联网 发布:重庆市软件行业协会 编辑:程序博客网 时间:2024/05/16 12:38
MR案例之WordCount
1、在Eclipse上编写程序
1.1、导入hadoop的jar包
1.2、项目结构
WordCountMapper.java
package com.matrix.mr;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { // 循环调用的方法,从split中每一行调用一次 @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] words = value.toString().split(" "); for (int i = 0; i < words.length; i++) { String word = words[i]; context.write(new Text(word), new IntWritable(1)); } }}
WordCountReduce.java
package com.matrix.mr;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> { // reduce也是循环调用,每一组调用一次 @Override protected void reduce(Text arg0, Iterable<IntWritable> arg1, Context arg2) throws IOException, InterruptedException { // 根据键进行分组, int sum = 0; for (IntWritable i : arg1) { sum = sum + i.get(); } arg2.write(arg0, new IntWritable(sum)); }}
RunJob.java
package com.matrix.mr;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class RunJob { public static void main(String[] args) { // 读取配置文件 Configuration conf = new Configuration(); try { FileSystem fs = FileSystem.get(conf); Job job = Job.getInstance(conf); // 设置用户名称 job.setJobName("wc"); // 设置任务的入口类 job.setJarByClass(RunJob.class); job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReduce.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); // 设置输入数据目录 FileInputFormat.addInputPath(job, new Path("/usr/matrix/input/wc")); // 针对目录进行判断 Path outdir = new Path("/usr/matrix/output/wc"); if (fs.exists(outdir)) { fs.delete(outdir, true); } // 设置输出数据目录 // path 一个目录 而且不能存在 FileOutputFormat.setOutputPath(job, outdir); boolean f = job.waitForCompletion(true); if (f) { System.out.println("WordCount程序运行成功!"); } } catch (Exception e) { e.printStackTrace(); } }}
wc.java
hadoop hello worldhello hivehadoop worldhello hbase
2、导出jar包
3、将jar包上传至node1主机/opt/modules/hadoop-2.5.1目录下
4、用命令运行jar包
[root@node1 hadoop-2.5.1]# hadoop jar wc.jar com.matrix.mr.RunJob
0 0
- MR案例之WordCount
- MR之wordcount
- MR英语单词频次统计案例-----wordcount
- Hadop案例之WordCount
- MR案例之手机流量
- MR案例之去重
- hadoop随笔二之MR-wordcount小试
- ODPS MR开发 WordCount
- MR代码实例-wordcount
- MR案例之实现平均成绩
- hadoop入门之wordcount小案例
- MR WordCount类基本解析
- wordcount的mr java代码
- MR案例之倒排索引TF-IDF
- 大数据之MapReduce详解(MR的运行机制及配合WordCount实例来说明运行机制)
- Hadoop中自带的examples之wordcount应用案例
- Mapreduce和HBase新版本整合之WordCount计数案例
- mapreduce wordcount案例
- IIS各个版本中你需要知道的那些事儿
- Android canvas.save()和canvas.restore()的理解
- 拓扑排序实例
- 基于zookeeper简单实现分布式锁
- 语音增强算法研究系列(一):MMSE-STSA 音频降噪
- MR案例之WordCount
- Vim的分割窗口split命令
- ios oc 类之间相互依赖 关于import和@class的区别
- MyEclipse配置SVN及获取服务器代码
- 神经网络的DBN,CNN,RCNN介绍
- 经典算法研究系列:七、深入浅出遗传算法
- javascript中的事件解析与示例
- HTC Vive的Lighthouse工作原理
- JavaSE_JavaDoc注释详解