Hadoop 用Eclipse来Mapreduce WordCount实战（1）

来源：互联网发布：网络购物诈骗定义编辑：程序博客网时间：2024/06/01 08:20

（一）官网下载http://www.eclipse.org/

（二）maven http://www.mvnrepository.com

选择对应的hadoop版本

拷贝对应的hadoop

<dependency>    <groupId>org.apache.hadoop</groupId>    <artifactId>hadoop-mapreduce-client-common</artifactId>    <version>2.4.1</version></dependency><dependency>    <groupId>org.apache.hadoop</groupId>    <artifactId>hadoop-mapreduce-client-core</artifactId>    <version>2.4.1</version></dependency><dependency>    <groupId>org.apache.hadoop</groupId>    <artifactId>hadoop-common</artifactId>    <version>2.4.1</version></dependency><dependency>    <groupId>org.apache.hadoop</groupId>    <artifactId>hadoop-hdfs</artifactId>    <version>2.4.1</version></dependency>

（三）解压下载Eclipse

(3.1) 右键工程|Build Path|Configure Build Path

(3.2) 安装Hadoop插件

（a）关闭Eclipse；将这个jar放入到D:\eclipse-jee-mars-2-win32\eclipse\plugins目录下

（b）启动Eclipse；Window|Preferences| (注意：Browse...选择当前Hadoop的目录)

（c）Window|Show View| Other

（d）必须先启动linux中的Hadoop

[hadoop@master-hadoop hadoop-2.4.1]$ sbin/start-dfs.sh

[hadoop@master-hadoop hadoop-2.4.1]$ sbin/start-yarn.sh

（e）单击右下角小象；设置 New Hadoop Location....

（f）显示效果

(3.3) Hadoop中的bin中缺少编译文件

(a) 将winutils.ext文件复制到C:\hadoop-2.4.1\hadoop-2.4.1\bin目录下

(b)将Hadoop.dll 文件复制到 C:\Windows\System32目录下

（3.4）编写源代码

WordCountMapper类

package com.hlx.mapreduce.wc;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;/** * 继承这个mapper LongWritable ==>long Text ===>String IntWritable==>int *  * @author Administrator * */public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {/** * 重写这个方法 */@Overrideprotected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)throws IOException, InterruptedException {// TODO Auto-generated method stub// super.map(key, value, context);// 1) 获得每一行的数据// hello hadoopString line = value.toString();// 2)分割每一行的数据//hello,hadoopString[] splits = line.split(" ");//3)遍历每一行的数据//hello 1//hadoop 1for(String str :splits){ //context上下 文数据（key--value 每个单词输出1次）context.write(new Text(str), new IntWritable(1));}}}

WordCountReduce类

package com.hlx.mapreduce.wc;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;/** * 继承Reduce  * Text ==>String * IntWritable ==>int  * (输入(key,value)，输出(key,value)) * @author Administrator * */public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> {// a 1// b 1// c 1// hello{1,1,1}==> hello{3}  ===>其实就是values    @Override    protected void reduce(Text key, Iterable<IntWritable> values,    Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {       int count=0; //累计和       //遍历数据       for(IntWritable value :values){       count +=value.get();       }              //写入到上下文       context.write(key, new IntWritable(count));        }}

WordCountMapReduce类

package com.hlx.mapreduce.wc;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/** * 测试类 *  * @author Administrator * */public class WordCountMapReduce {public static void main(String[] args) throws Exception {// 创建配置对象Configuration conf = new Configuration();// 创建job对象Job job = Job.getInstance(conf, "wordcount0");//设置运行的主类job.setJarByClass(WordCountMapReduce.class);//设置map类job.setMapperClass(WordCountMapper.class);//设置reduce类job.setReducerClass(WordCountReduce.class);//设置map(key,value)        job.setMapOutputKeyClass(Text.class);        job.setOutputValueClass(IntWritable.class);            //设置reduce(key,value)        job.setOutputKeyClass(Text.class);        job.setOutputValueClass(IntWritable.class);                //设置输入 输出路径 words=是输入文件夹中有个words文件； out3=是输出文件夹        FileInputFormat.setInputPaths(job, new Path("hdfs://master-hadoop.dragon.org:9000/words"));        FileOutputFormat.setOutputPath(job, new Path("hdfs://master-hadoop.dragon.org:9000/out3"));                //提交job        boolean flag= job.waitForCompletion(true);        if(!flag){        System.out.println("the task has failed!");        }}}

（3.5）运行

效果如下：

注意：其实源代码可以优化的！

0 0