hadoop学习1 实现WordCount
来源:互联网 发布:武汉java培训机构 编辑:程序博客网 时间:2024/05/21 03:28
想了解hadoop的基本原理,就要从WordCount开始,now let's go
1. map类
map方法有4个参数,分别为向map方法输入的key/value和向reduce方法输出的key/value
运行方法:
1. map类
map方法有4个参数,分别为向map方法输入的key/value和向reduce方法输出的key/value
import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.IntWritable;public class MapperClass extends Mapper<Object, Text, Text, IntWritable>{ //Test 相当于 String. IntWritable 相当于 intpublic Text keyText = new Text("key"); //String keyText = "key";public IntWritable intvalue = new IntWritable(1); @Overrideprotected void map(Object key, Text value, Context context)throws IOException, InterruptedException {//step1 get valueString str = value.toString();//step2 用空格分开,StringTokenizer类默认用空格分开StringTokenizer stringTokenizer = new StringTokenizer(str); while (stringTokenizer.hasMoreTokens()) { keyText.set(stringTokenizer.nextToken());//key context.write(keyText, intvalue); // output key/value 即context.write("My",1); } } }2.reduce类 reduce方法也有4个参数,分别为map向reduce输入的key/balue和reduce要输出的key/valueimport java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;public class ReduceClass extends Reducer<Text, IntWritable, Text, IntWritable>{public IntWritable IntValue = new IntWritable(0);@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, //name 2 name [1,1]Context context)throws IOException, InterruptedException {//step1int sum = 0;while(values.iterator().hasNext()){sum += values.iterator().next().get();}IntValue.set(sum);context.write(key, IntValue);}}3 WordCount类 在WordCount类中要对map类和reduce类做调度,并格式化输出结果import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.lib.input.*;import org.apache.hadoop.mapreduce.lib.output.*;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer;public class WordCount {public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {Configuration conf = new Configuration();Path in = new Path(args[0]);Path out = new Path(args[1]);if(args.length != 2){System.out.println("Usage: wordcount<in><out>");System.exit(2);}Job job = new Job(conf,"WordCount");job.setJarByClass(WordCount.class);job.setMapperClass(MapperClass.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPath(job, in);FileOutputFormat.setOutputPath(job, out);System.exit(job.waitForCompletion(true)? 0 : 1);}}
运行方法:
在shell界面输入
$ hadoop jar WordCount.jar com.itcast.hadoop.mapreduce.WordCount /user/hadoop/WordCount/WC.txt /user/hadoop/WordCount/output
输入文件:ps input key/value pairs to a set of intermediate key/value pairs.
Maps are the individual tasks which transform input records into a intermediate records. The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs.
输出文件:A 1
Maps 1The 1
a 2are 1
as 1be 1
given 1individual 1
input 4intermediate 3
into 1
key/value 2many 1
map 1may 1
need 1not 1
of 2or 1
output 1pair 1
pairs 1pairs. 2
ps 1records 2
records. 2same 1
set 1tasks 1
the 3to 2
transform 1
transformed 1type 1
which 1
zero 1不足:
因为使用了StringTokenizer中的默认方法,所以程序没有把pairs.这种形式当做pairs。 0 0
- hadoop学习1 实现WordCount
- Eclipse实现Hadoop WordCount
- Hadoop的WordCount实现
- Hadoop实战学习(1)-WordCount
- Hadoop学习笔记(1):WordCount程序的实现与总结
- hadoop学习笔记之wordcount
- Hadoop 从零开始学习系列-wordCount
- hadoop入门之wordcount学习
- centos6.4安装hadoop-1.2.1,实现wordcount功能
- Hadoop Pipes编程之C++实现WordCount
- Hadoop中WordCount例子的实现
- hadoop的第一个程序wordcount实现
- 关于hadoop wordcount的几种实现
- Hadoop流实现WordCount程序样例
- Hadoop中wordCount功能实现Demo
- Hadoop-MapReduce之WordCount的实现
- hadoop实现wordcount的三种方法
- hadoop+intellij+maven实现wordcount程序
- hdu 3452 Bonsai 最小割
- java IO流
- Hadoop Streaming的一些基本知识
- 【计算机网络常见面试题】IP地址一道计算题
- js 常用 校验
- hadoop学习1 实现WordCount
- 阿里巴巴2013 5月5日综合算法题详解
- 将java类的泛型集合转换成json对象
- JVM调优总结(十)-调优方法
- Android手机通过USB网络共享限速
- Mysql 登录及常用命令
- 习近平回应军费等问题 李克强主持经济形势座谈会
- Python 学习小笔记
- Ubuntu 设置命令行启动