eclipse 中运行mapreduce程序相关配置

来源:互联网 发布:电子喜帖制作软件 编辑:程序博客网 时间:2024/05/20 09:44

本文主要包括三部分内容

一.编译hadoop插件

   可以直接下载使用编译好的hadoop-eclipse-plugin.jar,只要版本兼容就行。我本来一开始就连接成功了,被一个异常迷惑了(这个异常下面将会提到),以为下载的版本问题,就决定重新编译了。如果直接使用编译好的可以直接跳过第一部分

二.在eclipse中连接hadoop集群

三.运行wordcount实例


第一部分 hadoop版本为2.7.4(64位版本)

本来在虚拟机中的linux环境中编译,但是异常看着比较费劲(jar包找不到问题),最后决定在windows中编译。

1.下载Ant,设置环境变量


查看版本测试是否配置成功,如下图。



2.下载hadoop2x-eclipse-plugin-master.zip


3.编译插件(2.7.4)

hadoop版本为2.7.4,插件版本需要一致。我本地编译好的hadoop2.7.4放在E:\Hadoop文件夹,编译前需要修改一些配置信息

3.1 jar包版本修改

hadoop目录E:\Hadoop\hadoop-2.7.4\share\hadoop\common\lib中jar包和插件配置文件(hadoop2x-eclipse-plugin-master\ivy\libraries.properties)中的版本信息要一致

以下是我的libraries.properties

#   Licensed under the Apache License, Version 2.0 (the "License");#   you may not use this file except in compliance with the License.#   You may obtain a copy of the License at##       http://www.apache.org/licenses/LICENSE-2.0##   Unless required by applicable law or agreed to in writing, software#   distributed under the License is distributed on an "AS IS" BASIS,#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.#   See the License for the specific language governing permissions and#   limitations under the License.#This properties file lists the versions of the various artifacts used by hadoop and components.#It drives ivy and the generation of a maven POM# This is the version of hadoop we are generatinghadoop.version=2.7.4hadoop-gpl-compression.version=0.1.0#These are the versions of our dependencies (in alphabetical order)apacheant.version=1.7.0ant-task.version=2.0.10asm.version=3.2aspectj.version=1.6.5aspectj.version=1.6.11checkstyle.version=4.2commons-cli.version=1.2commons-codec.version=1.4commons-collections.version=3.2.2commons-configuration.version=1.6commons-daemon.version=1.0.13commons-httpclient.version=3.1commons-lang.version=2.6commons-logging.version=1.1.3commons-logging-api.version=1.1.3commons-math.version=3.1.1commons-el.version=1.0commons-fileupload.version=1.2commons-io.version=2.4commons-net.version=3.1core.version=3.1.1coreplugin.version=1.3.2hsqldb.version=1.8.0.10htrace.version=3.1.0-incubatingivy.version=2.1.0jasper.version=5.5.12jackson.version=1.9.13#not able to figureout the version of jsp & jsp-api version to get it resolved throught ivy# but still declared here as we are going to have a local copy from the lib folderjsp.version=2.1jsp-api.version=5.5.12jsp-api-2.1.version=6.1.14jsp-2.1.version=6.1.14jets3t.version=0.6.1jetty.version=6.1.26jetty-util.version=6.1.26jersey-core.version=1.8jersey-json.version=1.8jersey-server.version=1.8junit.version=4.5jdeb.version=0.8jdiff.version=1.0.9json.version=1.0kfs.version=0.1log4j.version=1.2.17lucene-core.version=2.3.1mockito-all.version=1.8.5jsch.version=0.1.42oro.version=2.0.8rats-lib.version=0.5.1servlet.version=4.0.6servlet-api.version=2.5slf4j-api.version=1.7.10slf4j-log4j12.version=1.7.10wagon-http.version=1.0-beta-2xmlenc.version=0.52xerces.version=1.4.4protobuf.version=2.5.0guava.version=11.0.2netty.version=3.6.2.Final



3.2 修改\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin\build.xml

  3.2.1去掉retrieve-common依赖 否则编译的时候会卡住,如下图


修改该前

<target name="compile" depends="init, ivy-retrieve-common" unless="skip.contrib">    <echo message="contrib: ${name}"/>    <javac     encoding="${build.encoding}"     srcdir="${src.dir}"     includes="**/*.java"     destdir="${build.classes}"     debug="${javac.debug}"     deprecation="${javac.deprecation}">     <classpath refid="classpath"/>    </javac>  </target>


修改后,去掉 depends="init, ivy-retrieve-common"

<target name="compile" unless="skip.contrib">    <echo message="contrib: ${name}"/>    <javac     encoding="${build.encoding}"     srcdir="${src.dir}"     includes="**/*.java"     destdir="${build.classes}"     debug="${javac.debug}"     deprecation="${javac.deprecation}">     <classpath refid="classpath"/>    </javac>  </target>



    3.2.2.添加文件拷贝

    <copy file="${hadoop.home}/share/hadoop/common/lib/servlet-api-${servlet-api.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-io-${commons-io.version}.jar"  todir="${build.dir}/lib" verbose="true"/>


 找到<attribute name="Bundle-ClassPath"添加

 lib/servlet-api-${servlet-api.version}.jar, lib/commons-io-${commons-io.version}.jar


 完整的\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin\build.xml文件

<?xml version="1.0" encoding="UTF-8" standalone="no"?><!--   Licensed to the Apache Software Foundation (ASF) under one or more   contributor license agreements.  See the NOTICE file distributed with   this work for additional information regarding copyright ownership.   The ASF licenses this file to You under the Apache License, Version 2.0   (the "License"); you may not use this file except in compliance with   the License.  You may obtain a copy of the License at       http://www.apache.org/licenses/LICENSE-2.0   Unless required by applicable law or agreed to in writing, software   distributed under the License is distributed on an "AS IS" BASIS,   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.   See the License for the specific language governing permissions and   limitations under the License.--><project default="jar" name="eclipse-plugin">  <import file="../build-contrib.xml"/>  <path id="eclipse-sdk-jars">    <fileset dir="${eclipse.home}/plugins/">      <include name="org.eclipse.ui*.jar"/>      <include name="org.eclipse.jdt*.jar"/>      <include name="org.eclipse.core*.jar"/>      <include name="org.eclipse.equinox*.jar"/>      <include name="org.eclipse.debug*.jar"/>      <include name="org.eclipse.osgi*.jar"/>      <include name="org.eclipse.swt*.jar"/>      <include name="org.eclipse.jface*.jar"/>      <include name="org.eclipse.team.cvs.ssh2*.jar"/>      <include name="com.jcraft.jsch*.jar"/>    </fileset>   </path>  <path id="hadoop-sdk-jars">    <fileset dir="${hadoop.home}/share/hadoop/mapreduce">      <include name="hadoop*.jar"/>    </fileset>     <fileset dir="${hadoop.home}/share/hadoop/hdfs">      <include name="hadoop*.jar"/>    </fileset>     <fileset dir="${hadoop.home}/share/hadoop/common">      <include name="hadoop*.jar"/>    </fileset>   </path>  <!-- Override classpath to include Eclipse SDK jars -->  <path id="classpath">    <pathelement location="${build.classes}"/>    <!--pathelement location="${hadoop.root}/build/classes"/-->    <path refid="eclipse-sdk-jars"/>    <path refid="hadoop-sdk-jars"/>  </path>  <!-- Skip building if eclipse.home is unset. -->  <target name="check-contrib" unless="eclipse.home">    <property name="skip.contrib" value="yes"/>    <echo message="eclipse.home unset: skipping eclipse plugin"/>  </target> <target name="compile" unless="skip.contrib">    <echo message="contrib: ${name}"/>    <javac     encoding="${build.encoding}"     srcdir="${src.dir}"     includes="**/*.java"     destdir="${build.classes}"     debug="${javac.debug}"     deprecation="${javac.deprecation}">     <classpath refid="classpath"/>    </javac>  </target>  <!-- Override jar target to specify manifest -->  <target name="jar" depends="compile" unless="skip.contrib">    <mkdir dir="${build.dir}/lib"/>    <copy  todir="${build.dir}/lib/" verbose="true">          <fileset dir="${hadoop.home}/share/hadoop/mapreduce">           <include name="hadoop*.jar"/>          </fileset>    </copy>    <copy  todir="${build.dir}/lib/" verbose="true">          <fileset dir="${hadoop.home}/share/hadoop/common">           <include name="hadoop*.jar"/>          </fileset>    </copy>    <copy  todir="${build.dir}/lib/" verbose="true">          <fileset dir="${hadoop.home}/share/hadoop/hdfs">           <include name="hadoop*.jar"/>          </fileset>    </copy>    <copy  todir="${build.dir}/lib/" verbose="true">          <fileset dir="${hadoop.home}/share/hadoop/yarn">           <include name="hadoop*.jar"/>          </fileset>    </copy>    <copy  todir="${build.dir}/classes" verbose="true">          <fileset dir="${root}/src/java">           <include name="*.xml"/>          </fileset>    </copy>    <copy file="${hadoop.home}/share/hadoop/common/lib/htrace-core-${htrace.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/servlet-api-${servlet-api.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-io-${commons-io.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/protobuf-java-${protobuf.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/log4j-${log4j.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-configuration-${commons-configuration.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-lang-${commons-lang.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-collections-${commons-collections.version}.jar"  todir="${build.dir}/lib" verbose="true"/>      <copy file="${hadoop.home}/share/hadoop/common/lib/jackson-core-asl-${jackson.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/jackson-mapper-asl-${jackson.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/slf4j-log4j12-${slf4j-log4j12.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/slf4j-api-${slf4j-api.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/guava-${guava.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/hadoop-auth-${hadoop.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <copy file="${hadoop.home}/share/hadoop/common/lib/netty-${netty.version}.jar"  todir="${build.dir}/lib" verbose="true"/>    <jar      jarfile="${build.dir}/hadoop-${name}-${hadoop.version}.jar"      manifest="${root}/META-INF/MANIFEST.MF">      <manifest>   <attribute name="Bundle-ClassPath"     value="classes/,  lib/hadoop-mapreduce-client-core-${hadoop.version}.jar, lib/hadoop-mapreduce-client-common-${hadoop.version}.jar, lib/hadoop-mapreduce-client-jobclient-${hadoop.version}.jar, lib/hadoop-auth-${hadoop.version}.jar, lib/hadoop-common-${hadoop.version}.jar, lib/hadoop-hdfs-${hadoop.version}.jar, lib/protobuf-java-${protobuf.version}.jar, lib/log4j-${log4j.version}.jar, lib/commons-cli-${commons-cli.version}.jar, lib/commons-configuration-${commons-configuration.version}.jar, lib/commons-httpclient-${commons-httpclient.version}.jar, lib/commons-lang-${commons-lang.version}.jar,   lib/commons-collections-${commons-collections.version}.jar,   lib/jackson-core-asl-${jackson.version}.jar, lib/jackson-mapper-asl-${jackson.version}.jar, lib/slf4j-log4j12-${slf4j-log4j12.version}.jar, lib/slf4j-api-${slf4j-api.version}.jar, lib/guava-${guava.version}.jar, lib/netty-${netty.version}.jar, lib/htrace-core-${htrace.version}.jar, lib/servlet-api-${servlet-api.version}.jar, lib/commons-io-${commons-io.version}.jar" />   </manifest>      <fileset dir="${build.dir}" includes="classes/ lib/"/>      <!--fileset dir="${build.dir}" includes="*.xml"/-->      <fileset dir="${root}" includes="resources/ plugin.xml"/>    </jar>  </target></project>



 3.3.cd到\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin目录

   运行ant jar -Dversion=2.7.4 -Dhadoop.version=2.7.4 -Declipse.home=E:\IDE\eclipse -Dhadoop.home=E:\Hadoop\hadoop-2.7.4,如下图则编译成功

    

说明:1. 2.7.4为hadoop版本号

    2. Declipse.home为eclipse的安装目录

    3. -Dhadoop.home为本地hadoop的存放地址

 编译成功后\hadoop2x-eclipse-plugin-master\build\contrib\eclipse-plugin文件夹下将会出现hadoop-eclipse-plugin-2.7.4.jar


第二部分在eclipse中连接hadoop集群

1.eclipse中指定hadoop路径,如下图

Windows-->Rreferences


2.拷贝hadoop-eclipse-plugin-2.7.4.jar到eclipse路径下的plugins文件夹,重启eclipse。

3.eclipse中点击Window-->show view-->other 如下图,选择后点击OK。

此时eclipse下方出现Map/Reduce Location选项卡,如下图


下方鼠标右键选择new hadoop location下图为我的配置


配置完成后点击finish

注意:此时点击下图中的箭头会有错



出现这种情况网上有些解决办法,但是我试过之后问题依然存在,纠结了好久,以为没有连接成功,其实在左侧的Project Explorer中可以正常显示,如下图


出现上图表明eclipse链接hadoop成功了,具体为啥一直抛异常还不知道。


第三部分 运行wordcount实例

 1.创建mapreduce程序

 File-->New-->Other


点击Next,只要输入工程名Project name,点击Finish即可。

src下新建包名org.apache.hadoop.examples

拷贝WordCount源码,在Hadoop的源码目录下\hadoop-2.7.4-src\hadoop-mapreduce-project\hadoop-mapreduce-examples\src\main\java\org\apache\hadoop\examples\WordCount.java,直接拷贝到新建的包名下

注:hadoop-2.7.4-src为hadoop编译前的源码解压后的文件夹。

2.运行前需要先添加控制台的日志输出,否则控制台无法看见异常信息。

在工程目录的src下面添加log4j.properties文件,文件中配置如下

### set log levels ###log4j.rootLogger = INFO , C , D , E ### console ###log4j.appender.C = org.apache.log4j.ConsoleAppenderlog4j.appender.C.Target = System.outlog4j.appender.C.layout = org.apache.log4j.PatternLayoutlog4j.appender.C.layout.ConversionPattern = [Hadoop][%p] [%-d{yyyy-MM-dd HH:mm:ss}] %C.%M(%L) | %m%n### log file ###log4j.appender.D = org.apache.log4j.DailyRollingFileAppenderlog4j.appender.D.File = ../logs/Hadoop_info.loglog4j.appender.D.Append = truelog4j.appender.D.Threshold = INFO log4j.appender.D.layout = org.apache.log4j.PatternLayoutlog4j.appender.D.layout.ConversionPattern = [Hadoop][%p] [%-d{yyyy-MM-dd HH:mm:ss}] %C.%M(%L) | %m%n### exception ###log4j.appender.E = org.apache.log4j.DailyRollingFileAppenderlog4j.appender.E.File = ../logs/Hadoop_info.loglog4j.appender.E.Append = truelog4j.appender.E.Threshold = ERROR log4j.appender.E.layout = org.apache.log4j.PatternLayoutlog4j.appender.E.layout.ConversionPattern = [Hadoop][%p] [%-d{yyyy-MM-dd HH:mm:ss}] %C.%M(%L) | %m%n

工程截图如下图所示


3.右键WordCount.java Run As -> Run Configurations,配置WordCount运行的参数Arguments,分别为HDFS输入文件和输出文件,如下图



4.下载hadoop.dll和winutils.exe(我下载的hadoop.dll要是64位的),将winutils.exe放到Windows中hadoop安装目录的bin目录下,把hadoop.dll拷贝到C:\Windows\System32下面

5.windows添加两个系统环境变量,并将MAVEN_HOME添加到PATH中,如下图


6.配置完重启eclipse,运行wordcount.java

右键WorkCount.java Run As-->Run On Hadoop,出现如下信息则表示运行成功



说明:可能运行的时候还有其他的错误出现,只要根据控制台出现的异常信息去搜索基本都可以解决,所以要先把日志配起来。

原创粉丝点击