eclipse环境下使用hadoop-eclipse插件配置并开发mapreduce程序

来源:互联网 发布:知乎油腻 编辑:程序博客网 时间:2024/06/05 03:51
hadoop集群的搭建请参考这两篇文章,第一篇是在redhat下,第二篇就是在ubuntu下,当然ubuntu也适用。

http://blog.csdn.net/haojun186/article/details/7466207
http://lingyibin.iteye.com/blog/875535

这几天一直在搞eclipse下开发hadoop程序的环境搭建,在插件这个问题上花了不少时间,各种错误,各种郁闷,今天终于走出来了,特为此分享下经验教训,有问题请留言啊

一、下载hadoop-eclipse-1.0.1插件

http://download.csdn.net/detail/shuangtaqibing/4461472

该插件经测试,适用于eclipse3.7(Indigo)和Eclipse3.8(judo)两个版本。

二、安装插件,直接把插件copy到Eclipse下的plugins目录中,重启Eclipse即可。安装完成后可看到以下界面,没有的话在windows->show view中打开

 

二、配置插件

本人是在vmware中安装的ubuntu10.4中测试的,Eclipse使用Judo这一版本。


 

主机host填写hadoop集群所在地址,根据每个人配置集群设置的ip来填写,有的人集群设置为localhost,有的为127.0.0.1,这些都是和你个人配置集群时所配置的core-site.xml等文件的配置有关。

设置完成后,启动集群,start-mared.sh,start-dfs.sh。

三、创建一个Map/Reduce工程

1. new 一个 project,选择Map/Reduce  Project

2、工程名叫WordCount ,

注意:hadoop install directory是你的hadoop解压目录,这样才能让eclipse找到工程所需jar文件,然后成功新建工程。

然后如下图所示,建立三个java文件

 

MyDriver.jar如下:

 

 

 

package org;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;import org.MyMap;import org.MyReduce;public class MyDriver {public static void main(String[] args) throws Exception,InterruptedException {Configuration conf = new Configuration();Job job = new Job(conf, "Hello Hadoop");job.setJarByClass(MyDriver.class);job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);job.setMapperClass(MyMap.class);job.setCombinerClass(MyReduce.class);job.setReducerClass(MyReduce.class);job.setInputFormatClass(TextInputFormat.class);job.setOutputFormatClass(TextOutputFormat.class);FileInputFormat.setInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));// JobClient.runJob(conf);job.waitForCompletion(true);}}


 

 MyMap.java如下:

package org;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;public class MyMap extends Mapper<Object, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word;public void map(Object key, Text value, Context context)throws IOException, InterruptedException {String line = value.toString();StringTokenizer tokenizer = new StringTokenizer(line);while (tokenizer.hasMoreTokens()) {word = new Text();word.set(tokenizer.nextToken());context.write(word, one);}}}

MyReduce.java如下:

package org;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;public class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> {public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException {int sum = 0;for (IntWritable val : values) {sum += val.get();}context.write(key, new IntWritable(sum));}}


建立完这三个文件号,在工程WordCount文件夹下建立一个输入目录input,作为MapReduce的输入目录,在目录下放两个文件,分别是testFile1.txt,testFile2.txt文件内容分别是:

textFile1文件内容

hello hadoop,this is lingyibin

textFile2的文件内容

this is the world of lingyibin.wellcome hadoop.

 

最后设置下MyDriver的运行时参数,如下图所示:

 

然后Apply,然后选择run on hadoop ,运行结果如下:

 

 

控制台输出:

12/07/31 00:34:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/07/31 00:34:51 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/07/31 00:34:51 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
****file:/home/malipeng/workspace_hadoop1/WordCount/input
12/07/31 00:34:51 INFO input.FileInputFormat: Total input paths to process : 2
12/07/31 00:34:51 INFO mapred.JobClient: Running job: job_local_0001
12/07/31 00:34:51 INFO util.ProcessTree: setsid exited with exit code 0
12/07/31 00:34:51 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@62937c
12/07/31 00:34:51 INFO mapred.MapTask: io.sort.mb = 100
12/07/31 00:34:51 INFO mapred.MapTask: data buffer = 79691776/99614720
12/07/31 00:34:51 INFO mapred.MapTask: record buffer = 262144/327680
12/07/31 00:34:51 INFO mapred.MapTask: Starting flush of map output
12/07/31 00:34:52 INFO mapred.MapTask: Finished spill 0
12/07/31 00:34:52 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/07/31 00:34:52 INFO mapred.JobClient:  map 0% reduce 0%
12/07/31 00:34:54 INFO mapred.LocalJobRunner:
12/07/31 00:34:54 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/07/31 00:34:54 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c695a6
12/07/31 00:34:54 INFO mapred.MapTask: io.sort.mb = 100
12/07/31 00:34:54 INFO mapred.MapTask: data buffer = 79691776/99614720
12/07/31 00:34:54 INFO mapred.MapTask: record buffer = 262144/327680
12/07/31 00:34:54 INFO mapred.MapTask: Starting flush of map output
12/07/31 00:34:54 INFO mapred.MapTask: Finished spill 0
12/07/31 00:34:54 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
12/07/31 00:34:55 INFO mapred.JobClient:  map 100% reduce 0%
12/07/31 00:34:57 INFO mapred.LocalJobRunner:
12/07/31 00:34:57 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
12/07/31 00:34:57 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@73a7ab
12/07/31 00:34:57 INFO mapred.LocalJobRunner:
12/07/31 00:34:57 INFO mapred.Merger: Merging 2 sorted segments
12/07/31 00:34:57 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 143 bytes
12/07/31 00:34:57 INFO mapred.LocalJobRunner:
12/07/31 00:34:57 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/07/31 00:34:57 INFO mapred.LocalJobRunner:
12/07/31 00:34:57 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/07/31 00:34:57 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to output2
12/07/31 00:35:00 INFO mapred.LocalJobRunner: reduce > reduce
12/07/31 00:35:00 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
12/07/31 00:35:01 INFO mapred.JobClient:  map 100% reduce 100%
12/07/31 00:35:01 INFO mapred.JobClient: Job complete: job_local_0001
12/07/31 00:35:01 INFO mapred.JobClient: Counters: 20
12/07/31 00:35:01 INFO mapred.JobClient:   File Output Format Counters
12/07/31 00:35:01 INFO mapred.JobClient:     Bytes Written=102
12/07/31 00:35:01 INFO mapred.JobClient:   FileSystemCounters
12/07/31 00:35:01 INFO mapred.JobClient:     FILE_BYTES_READ=1904
12/07/31 00:35:01 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=99030
12/07/31 00:35:01 INFO mapred.JobClient:   File Input Format Counters
12/07/31 00:35:01 INFO mapred.JobClient:     Bytes Read=73
12/07/31 00:35:01 INFO mapred.JobClient:   Map-Reduce Framework
12/07/31 00:35:01 INFO mapred.JobClient:     Map output materialized bytes=151
12/07/31 00:35:01 INFO mapred.JobClient:     Map input records=2
12/07/31 00:35:01 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/07/31 00:35:01 INFO mapred.JobClient:     Spilled Records=22
12/07/31 00:35:01 INFO mapred.JobClient:     Map output bytes=117
12/07/31 00:35:01 INFO mapred.JobClient:     Total committed heap usage (bytes)=481505280
12/07/31 00:35:01 INFO mapred.JobClient:     CPU time spent (ms)=0
12/07/31 00:35:01 INFO mapred.JobClient:     SPLIT_RAW_BYTES=264
12/07/31 00:35:01 INFO mapred.JobClient:     Combine input records=11
12/07/31 00:35:01 INFO mapred.JobClient:     Reduce input records=11
12/07/31 00:35:01 INFO mapred.JobClient:     Reduce input groups=10
12/07/31 00:35:01 INFO mapred.JobClient:     Combine output records=11
12/07/31 00:35:01 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
12/07/31 00:35:01 INFO mapred.JobClient:     Reduce output records=10
12/07/31 00:35:01 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
12/07/31 00:35:01 INFO mapred.JobClient:     Map output records=11