HADOOP 分布式集群环境下第一个mapReduce程序—WordCount

来源：互联网发布：金数据怎么导出数据编辑：程序博客网时间：2024/06/11 21:10

关于hadoop 分布式集群环境搭建，本人已经在博客中的 ubuntu hadoop 全分布式集群搭建中介绍清楚了。具体详见（http://blog.csdn.net/luoluowushengmimi/article/details/17264129）

一、linux 环境下SpringTooSuit安装

首先进入网站http://eclipse.org/downloads/?osType=linux下载SpringToolSuit，然后解压相关的下载包（本文下载的是3.4.0 tar.gz版本），然后进入解压后的文件夹，进入sts-3.4.0.RELEASE文件夹下，点击STS，打开STS.

二、linux 环境下编译

1.0之后hadoop已经不自带eclipse的插件包了，所以得需要我们自己编译源码生成插件包，建议在Linux下编译，Centos6.4的版本，hadoop1.2.0的版本， hadoop的目录在/root/gy/hadoop-1.2.1下面 ,STS的目录在/root/gy/springsource下面 ,总结一下如下的四步来完成编译eclipse插件的过程 :

1)配置build.xml文件

进入/root/hadoop-1.2.0/src/contrib/eclipse-plugin下面，修改build.xml。设定eclipse的根目录、hadoop的版本号、hadoop的一些引用包以及在javac里加入 includeantruntime="on".

<?xml version="1.0" encoding="UTF-8" standalone="no"?><!--   Licensed to the Apache Software Foundation (ASF) under one or more   contributor license agreements.  See the NOTICE file distributed with   this work for additional information regarding copyright ownership.   The ASF licenses this file to You under the Apache License, Version 2.0   (the "License"); you may not use this file except in compliance with   the License.  You may obtain a copy of the License at       http://www.apache.org/licenses/LICENSE-2.0   Unless required by applicable law or agreed to in writing, software   distributed under the License is distributed on an "AS IS" BASIS,   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.   See the License for the specific language governing permissions and   limitations under the License.-->             <!-- build.xml --><project default="jar" name="eclipse-plugin">  <import file="../build-contrib.xml"/>   <property name="hadoop.dir" value="/root/gy/hadoop-1.2.1">  <property name="eclipse.home" location="/root/gy/springsource"/>  <property name="version" value="1.2.1"/>    <path id="eclipse-sdk-jars">    <fileset dir="${eclipse.home}/plugins/">      <include name="org.eclipse.ui*.jar"/>      <include name="org.eclipse.jdt*.jar"/>      <include name="org.eclipse.core*.jar"/>      <include name="org.eclipse.equinox*.jar"/>      <include name="org.eclipse.debug*.jar"/>      <include name="org.eclipse.osgi*.jar"/>      <include name="org.eclipse.swt*.jar"/>      <include name="org.eclipse.jface*.jar"/>      <include name="org.eclipse.team.cvs.ssh2*.jar"/>      <include name="com.jcraft.jsch*.jar"/>    </fileset>   </path>  <!-- Override classpath to include Eclipse SDK jars -->  <path id="classpath">    <pathelement location="${build.classes}"/>    <pathelement location="${hadoop.root}/build/classes"/> <fileset dir="${hadoop.root}">        <include name="**/*.jar" />    </fileset>    <path refid="eclipse-sdk-jars"/>  </path>  <!-- Skip building if eclipse.home is unset. -->  <target name="check-contrib" unless="eclipse.home">    <property name="skip.contrib" value="yes"/>    <echo message="eclipse.home unset: skipping eclipse plugin"/>  </target> <target name="compile" depends="init, ivy-retrieve-common" unless="skip.contrib">    <echo message="contrib: ${name}"/>    <javac     encoding="${build.encoding}"     srcdir="${src.dir}"     includes="**/*.java"     destdir="${build.classes}"     debug="${javac.debug}"     deprecation="${javac.deprecation}" includeantruntime="on">     <classpath refid="classpath"/>    </javac>  </target>  <!-- Override jar target to specify manifest -->  <target name="jar" depends="compile" unless="skip.contrib">    <mkdir dir="${build.dir}/lib"/>  <copy file="${hadoop.root}/hadoop-core-${version}.jar" tofile="${build.dir}/lib/hadoop-core.jar" verbose="true"/><copy file="${hadoop.root}/lib/commons-cli-${commons-cli.version}.jar"  tofile="${build.dir}/lib/commons-cli.jar" verbose="true"/><copy file="${hadoop.root}/lib/commons-configuration-1.6.jar"  tofile="${build.dir}/lib/commons-configuration.jar" verbose="true"/><copy file="${hadoop.root}/lib/commons-httpclient-3.0.1.jar"  tofile="${build.dir}/lib/commons-httpclient.jar" verbose="true"/><copy file="${hadoop.root}/lib/commons-lang-2.4.jar"  tofile="${build.dir}/lib/commons-lang.jar" verbose="true"/><copy file="${hadoop.root}/lib/jackson-core-asl-1.8.8.jar"  tofile="${build.dir}/lib/jackson-core-asl.jar" verbose="true"/><copy file="${hadoop.root}/lib/jackson-mapper-asl-1.8.8.jar"  tofile="${build.dir}/lib/jackson-mapper-asl.jar" verbose="true"/>  <jar      jarfile="${build.dir}/hadoop-${name}-${version}.jar"      manifest="${root}/META-INF/MANIFEST.MF">      <fileset dir="${build.dir}" includes="classes/ lib/"/>      <fileset dir="${root}" includes="resources/ plugin.xml"/>    </jar>  </target></project>

2. 修改build-contrib.xml

cd /hadoop-1.2.1/src/contribvi build-contrib.xml<property name="hadoop.root" location="/root/gy/hadoop-1.2.1"/><property name="eclipse.home" location="/root/gy/springsource" /><property name="javac.deprecation" value="on"/>

3 修改MANIFEST.MF

Bundle-ClassPath: classes/,lib/commons-cli.jar,lib/commons-httpclient.jar,lib/hadoop-core.jar,lib/jackson-mapper-asl.jar,lib/commons-configuration.jar,lib/commons-lang.jar,lib/jackson-core-asl.jar

4 使用shell命令进入/root/hadoop-1.2.0/src/contrib/eclipse-plugin下面，执行ant命令进行构建

三、在eclipse当中构建hadoop项目

1.将ant生成的hadoop-1.2.1-eclipse-plugin.jar复制到 eclipse安装目录/plugins/ 下。

2. 重启eclipse，配置hadoop installation directory。
如果安装插件成功，打开Window-->Preferens，你会发现Hadoop Map/Reduce选项，在这个选项里你需要配置Hadoop installation directory。配置完成后退出。

3. 配置Map/Reduce Locations。
在Window-->Show View中打开Map/Reduce Locations。
在Map/Reduce Locations中新建一个Hadoop Location。在这个View中，右键-->New Hadoop Location。在弹出的对话框中你需要配置Location name，如Hadoop，还有Map/Reduce Master和DFS Master。这里面的Host、Port分别为你在mapred-site.xml、core-site.xml中配置的地址及端口。

4. 新建项目

File-->New-->Other-->Map/Reduce Project 项目名可以随便取，如WordCount。

5. WordCount源码

package com.test.word;import java.io.IOException;import java.util.*;import org.apache.hadoop.fs.Path;import org.apache.hadoop.conf.*;import org.apache.hadoop.io.*;import org.apache.hadoop.mapred.*;public class WordCount {public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1); private Text word = new Text();@Overridepublic void map(LongWritable arg0, Text value,OutputCollector<Text, IntWritable> output, Reporter arg3)throws IOException {String line = value.toString();StringTokenizer tokenizer = new StringTokenizer(line);while (tokenizer.hasMoreTokens()) {   word.set(tokenizer.nextToken());   output.collect(word, one);   }} }public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {@Overridepublic void reduce(Text key, Iterator<IntWritable> values,OutputCollector<Text, IntWritable> output, Reporter arg3)throws IOException {int sum = 0;while (values.hasNext()) {sum += values.next().get();} output.collect(key, new IntWritable(sum));} }public static void main(String[] args) throws Exception {JobConf conf = new JobConf(WordCount.class); conf.setJarByClass(com.test.word.WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path("/root/gy/testData/input.txt")); FileOutputFormat.setOutputPath(conf, new Path("/root/gy/testData/output")); JobClient.runJob(conf);}}

6 最终运行成功，后台信息。

这个是我ant后生成hadoop-eclipse-plugin jar包，可直接使用。这个jar的地址是： http://download.csdn.net/detail/luoluowushengmimi/6869717

0 0