Linux下Eclispe开发MapReduce环境搭建

来源：互联网发布：python基础编程第二版编辑：程序博客网时间：2024/06/05 15:53

Tips：本文是针对分布式Hadoop集群环境来讲，在阅读本文前请确保机器已经安装好Hadoop集群并且正常启动。由于eclispe必须图形化操作，如果想要图形化操作远程机器可以安装VNC。（VNC安装请查看本博客的ＶＮＣ安装使用说明）

1、准备工作

下载eclipse官网地址：http://www.eclipse.org/downloads/任意选一个版本如：eclipse-SDK-4.5M1-linux -gtk.tar.gz

下载hadoop-eclipse-plugin地址：http://download.csdn.net/detail/jxmykl/7786833

2、安装

l 将eclipse-SDK-3.7.2-linux-gtk.tar.gz复制到/home/hadoop文件夹下

l 进入hadoop目录下: cd /home/Hadoop

l 安装eclipse: tar-zxvf eclipse-SDK-4.2-linux-gtk.tar.gz

l 将下载的hadoop-eclipse-plugin拷贝到eclispe/plugin/文件夹下

l 启动eclipse[前提是进入图形界面,单击eclispe文件下的eclispe图标]

l 启动eclispe后在菜单栏找到windoe----Preference选择HadoopMap/Reduce选项

l 即可新建项目进行编程

3、例子：

NewProject----选择Map/Reduce Project（此处和windows下操作相同）

使用下列例子测试环境是否成功

import java.io.IOException;

import java.util.StringTokenizer;

importorg.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

importorg.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

importorg.apache.hadoop.mapreduce.Mapper;

importorg.apache.hadoop.mapreduce.Reducer;

importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;

importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

publicstatic class TokenizerMapper extends Mapper<Object, Text, Text,IntWritable>

{

private final static IntWritable one =new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value,Context context ) throws IOException, InterruptedException

{

StringTokenizer itr = newStringTokenizer(value.toString());

while (itr.hasMoreTokens())

{

word.set(itr.nextToken());

context.write(word, one);

}

public static class IntSumReducer extends Reducer<Text, IntWritable,Text, IntWritable>

{

private IntWritable result = newIntWritable();

public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException,InterruptedException

{

int sum = 0;

for (IntWritable val : values)

{

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

public static void main(String[] args) throws Exception

{

Configuration conf = newConfiguration();

String[] otherArgs = newGenericOptionsParser(conf, args).getRemainingArgs();

if (otherArgs.length != 2)

{

System.err.println("Usage:wordcount <in> <out>");

System.exit(2);

}

Job job = new Job(conf, "wordcount");

job.setJarByClass(WordCount.class);

job.setMapperClass(TokenizerMapper.class);

job.setCombinerClass(IntSumReducer.class);

job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, newPath(otherArgs[0]));

FileOutputFormat.setOutputPath(job, newPath(otherArgs[1]));

System.exit(job.waitForCompletion(true)? 0 : 1);

}

在编写完的.java程序上，点击右键run as选择configuration，添加输入输出参数如下：

配置完毕，apply---run；可以看到控制台输出的程序运行过程。

可以在设置的文件夹中查看结果。

4、打包

新建类时需要点选包名；

.java程序上右键---export ---选择JAR文件----勾选所需文件（右侧全选），配置输出路径

一路next，最后选择main-class（java程序的主类名，但需要包含包名）

将程序打包后可在Hadoop集群上运行

使用方法：

1) hdfs上创建目录

$ hadoop fs -mkdir /input

2) 上传文件到hdfs

$ cat in1.txt

Hello World , Hello China, Hello Shanghai

I love China

How are you

$ hadoop fs -putin1.txt /input

3) 运行打好的包

进入包所在的文件夹

$ cd /usr/hadoop/ (根据所放包的具体位置而定)

$ hadoop jar wordcount.jar /input /output(output不需要自己建)

4) 查看运行结果

$ hadoop fs -cat /output/part-r-00000

结束！

本博文参考网上各位大神的资料，同时图片来自互联网，在此表示感谢各位的贡献！

0 0