Hadop案例之WordCount
来源:互联网 发布:js new 数组 编辑:程序博客网 时间:2024/06/05 20:11
代码如下:
package hadopp_wordCount;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
//map
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable>
{
private static final IntWritable one =new IntWritable(1);
private Text word = new Text();
@Override
protected void map(LongWritable key, Textvalue, Contextcontext)
throwsIOException, InterruptedException {
StringTokenizer iter = new StringTokenizer(value.toString());
while (iter.hasMoreTokens()) {
word.set(iter.nextToken());
context.write(word,one);
}
}
}
//reduce
public static class reduce extends Reducer<Text, IntWritable, Text, IntWritable>
{
private IntWritable result = new IntWritable();
@Override
protected void reduce(Text key,Iterable<IntWritable>value,
Contextcont) throwsIOException, InterruptedException {
int sum = 0;
for (IntWritablei : value) {
sum += i.get();
}
result.set(sum);
cont.write(key,result);
}
}
//main
public static void main(Stringargs[]) throwsException
{
Configuration conf =new Configuration();
String[] otherArgs =new GenericOptionsParser(conf,args).getRemainingArgs();
if (otherArgs.length <2) {
System.out.println("Usage:wordcount <in> [<in>...] <out>");
System.exit(2);
}
Job job =new Job(conf,"wordCount");
job.setJarByClass(WordCount.class);
job.setMapperClass(Map.class);
job.setCombinerClass(reduce.class);
job.setReducerClass(reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for(inti = 0;i < otherArgs.length -1;i++)
{
FileInputFormat.addInputPath(job,new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,new Path(otherArgs[otherArgs.length -1]));
System.exit(job.waitForCompletion(true) ?0 : 1);
}
}
代码比较简单,网上也有很多介绍,本文不再详细描述。
需要注意的一点是命名空间问题:
如果按照如下方式执行WordCount,会报错:
root@node1:/usr/local/hadoop/hadoop-2.5.2/myJar#hadoop jar WordCount.jar WordCount /usr/local/hadooptempdata/input/wc/usr/local/hadooptempdata/output/wc
Exception in thread "main"java.lang.ClassNotFoundException: WordCount
atjava.net.URLClassLoader.findClass(URLClassLoader.java:381)
atjava.lang.ClassLoader.loadClass(ClassLoader.java:424)
atjava.lang.ClassLoader.loadClass(ClassLoader.java:357)
atjava.lang.Class.forName0(Native Method)
atjava.lang.Class.forName(Class.java:348)
atorg.apache.hadoop.util.RunJar.main(RunJar.java:205)
原因是默认命名空间问题,本文中使用的包是package hadopp_wordCount;
按照如下方式执行就没问题:
hadoop jar WordCount.jarhadopp_wordCount.WordCount /usr/local/hadooptempdata/input/wc /usr/local/hadooptempdata/output/wc
- Hadop案例之WordCount
- MR案例之WordCount
- hadoop入门之wordcount小案例
- Hadoop中自带的examples之wordcount应用案例
- Mapreduce和HBase新版本整合之WordCount计数案例
- mapreduce wordcount案例
- Hadoop 运行wordcount案例
- Hadoop的WordCount案例
- 运行WordCount案例
- MapperReduce入门Wordcount案例
- strom wordcount java 实现案例
- spark RDD ,wordcount案例解析
- WordCount案例---MapReduce学习小结(-)
- 配置Hadoop集群+WordCount案例
- 配置Hadoop集群+WordCount案例
- hadoop案例实现之WordCount (计算单词出现的频数)
- hadoop入门之利用hadoop来对文档数据归类统计案例wordcount
- Mapreduce Java实现WordCount 小案例
- 2016就快结束了
- php基础07_文件上传
- [AHK]利用声音提示
- 十二月英语总结
- Time && Mathf
- Hadop案例之WordCount
- 对象的引用
- 记负均正
- LeetCode 5.Longest Palindromic Substring
- Android传递简单的数据
- Java之——使用 maven 插件 maven-shade-plugin 对可执行 java 工程及其全部依赖 jar 进行打包
- BZOJ 1951 古代猪文 鲁卡斯定理+费马小定理+中国剩余定理
- 10.生产者消费者模型
- 最大间隙问题。给定 n 个实数,求这n个实数在数轴上相邻2个数之间的最大差值,设计解最大间隙问题的线性时间算法。