Hadoop 用Eclipse来Mapreduce WordCount实战(1)

来源:互联网 发布:网络购物诈骗定义 编辑:程序博客网 时间:2024/06/01 08:20

(一)官网下载http://www.eclipse.org/

 





(二)maven http://www.mvnrepository.com

   

选择对应的hadoop版本



拷贝对应的hadoop

<dependency>    <groupId>org.apache.hadoop</groupId>    <artifactId>hadoop-mapreduce-client-common</artifactId>    <version>2.4.1</version></dependency><dependency>    <groupId>org.apache.hadoop</groupId>    <artifactId>hadoop-mapreduce-client-core</artifactId>    <version>2.4.1</version></dependency><dependency>    <groupId>org.apache.hadoop</groupId>    <artifactId>hadoop-common</artifactId>    <version>2.4.1</version></dependency><dependency>    <groupId>org.apache.hadoop</groupId>    <artifactId>hadoop-hdfs</artifactId>    <version>2.4.1</version></dependency>

(三)解压下载Eclipse

    (3.1) 右键工程|Build Path|Configure Build Path

     

     

   (3.2) 安装Hadoop插件

  

   (a)关闭Eclipse; 将这个jar放入到D:\eclipse-jee-mars-2-win32\eclipse\plugins目录下

    

   (b)启动Eclipse;Window|Preferences|  (注意:Browse...选择当前Hadoop的目录)

    

    (c)Window|Show View| Other

     

     (d)必须先启动linux中的Hadoop

          [hadoop@master-hadoop hadoop-2.4.1]$ sbin/start-dfs.sh

          [hadoop@master-hadoop hadoop-2.4.1]$ sbin/start-yarn.sh

     (e)单击右下角小象;设置 New Hadoop Location....

        


     (f)显示效果

       

       (3.3) Hadoop中的bin中缺少编译文件

     

        (a) 将winutils.ext文件复制到C:\hadoop-2.4.1\hadoop-2.4.1\bin目录下

        (b)将Hadoop.dll 文件复制到 C:\Windows\System32目录下

 

    (3.4)编写源代码


  WordCountMapper类

package com.hlx.mapreduce.wc;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;/** * 继承这个mapper LongWritable ==>long Text ===>String IntWritable==>int *  * @author Administrator * */public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {/** * 重写这个方法 */@Overrideprotected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)throws IOException, InterruptedException {// TODO Auto-generated method stub// super.map(key, value, context);// 1) 获得每一行的数据// hello hadoopString line = value.toString();// 2)分割每一行的数据//hello,hadoopString[] splits = line.split(" ");//3)遍历每一行的数据//hello 1//hadoop 1for(String str :splits){ //context上下 文数据(key--value 每个单词输出1次)context.write(new Text(str), new IntWritable(1));}}}

WordCountReduce类

package com.hlx.mapreduce.wc;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;/** * 继承Reduce  * Text ==>String * IntWritable ==>int  * (输入(key,value),输出(key,value)) * @author Administrator * */public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> {// a 1// b 1// c 1// hello{1,1,1}==> hello{3}  ===>其实就是values    @Override    protected void reduce(Text key, Iterable<IntWritable> values,    Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {       int count=0; //累计和       //遍历数据       for(IntWritable value :values){       count +=value.get();       }              //写入到上下文       context.write(key, new IntWritable(count));        }}
 

WordCountMapReduce类

package com.hlx.mapreduce.wc;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/** * 测试类 *  * @author Administrator * */public class WordCountMapReduce {public static void main(String[] args) throws Exception {// 创建配置对象Configuration conf = new Configuration();// 创建job对象Job job = Job.getInstance(conf, "wordcount0");//设置运行的主类job.setJarByClass(WordCountMapReduce.class);//设置map类job.setMapperClass(WordCountMapper.class);//设置reduce类job.setReducerClass(WordCountReduce.class);//设置map(key,value)        job.setMapOutputKeyClass(Text.class);        job.setOutputValueClass(IntWritable.class);            //设置reduce(key,value)        job.setOutputKeyClass(Text.class);        job.setOutputValueClass(IntWritable.class);                //设置输入 输出路径 words=是输入文件夹中有个words文件; out3=是输出文件夹        FileInputFormat.setInputPaths(job, new Path("hdfs://master-hadoop.dragon.org:9000/words"));        FileOutputFormat.setOutputPath(job, new Path("hdfs://master-hadoop.dragon.org:9000/out3"));                //提交job        boolean flag= job.waitForCompletion(true);        if(!flag){        System.out.println("the task has failed!");        }}}

    (3.5)运行


效果如下:

注意:其实源代码可以优化的!







  



0 0
原创粉丝点击