HADOOP 分布式集群环境下第一个mapReduce程序—WordCount
来源:互联网 发布:金数据怎么导出数据 编辑:程序博客网 时间:2024/06/11 21:10
关于hadoop 分布式集群环境搭建,本人已经在博客中的 ubuntu hadoop 全分布式集群搭建中介绍清楚了。具体详见(http://blog.csdn.net/luoluowushengmimi/article/details/17264129)
一、linux 环境下SpringTooSuit安装
首先进入网站http://eclipse.org/downloads/?osType=linux下载SpringToolSuit,然后解压相关的下载包(本文下载的是3.4.0 tar.gz版本),然后进入解压后的文件夹,进入sts-3.4.0.RELEASE文件夹下,点击STS,打开STS.
二、linux 环境下编译
1.0之后hadoop已经不自带eclipse的插件包了,所以得需要我们自己编译源码生成插件包,建议在Linux下编译,Centos6.4的版本,hadoop1.2.0的版本, hadoop的目录在/root/gy/hadoop-1.2.1下面 ,STS的目录在/root/gy/springsource下面 ,总结一下如下的四步来完成编译eclipse插件的过程 :
1)配置build.xml文件
进入/root/hadoop-1.2.0/src/contrib/eclipse-plugin下面,修改build.xml。设定eclipse的根目录、hadoop的版本号、hadoop的一些引用包以及在javac里加入 includeantruntime="on".
<?xml version="1.0" encoding="UTF-8" standalone="no"?><!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.--> <!-- build.xml --><project default="jar" name="eclipse-plugin"> <import file="../build-contrib.xml"/> <property name="hadoop.dir" value="/root/gy/hadoop-1.2.1"> <property name="eclipse.home" location="/root/gy/springsource"/> <property name="version" value="1.2.1"/> <path id="eclipse-sdk-jars"> <fileset dir="${eclipse.home}/plugins/"> <include name="org.eclipse.ui*.jar"/> <include name="org.eclipse.jdt*.jar"/> <include name="org.eclipse.core*.jar"/> <include name="org.eclipse.equinox*.jar"/> <include name="org.eclipse.debug*.jar"/> <include name="org.eclipse.osgi*.jar"/> <include name="org.eclipse.swt*.jar"/> <include name="org.eclipse.jface*.jar"/> <include name="org.eclipse.team.cvs.ssh2*.jar"/> <include name="com.jcraft.jsch*.jar"/> </fileset> </path> <!-- Override classpath to include Eclipse SDK jars --> <path id="classpath"> <pathelement location="${build.classes}"/> <pathelement location="${hadoop.root}/build/classes"/> <fileset dir="${hadoop.root}"> <include name="**/*.jar" /> </fileset> <path refid="eclipse-sdk-jars"/> </path> <!-- Skip building if eclipse.home is unset. --> <target name="check-contrib" unless="eclipse.home"> <property name="skip.contrib" value="yes"/> <echo message="eclipse.home unset: skipping eclipse plugin"/> </target> <target name="compile" depends="init, ivy-retrieve-common" unless="skip.contrib"> <echo message="contrib: ${name}"/> <javac encoding="${build.encoding}" srcdir="${src.dir}" includes="**/*.java" destdir="${build.classes}" debug="${javac.debug}" deprecation="${javac.deprecation}" includeantruntime="on"> <classpath refid="classpath"/> </javac> </target> <!-- Override jar target to specify manifest --> <target name="jar" depends="compile" unless="skip.contrib"> <mkdir dir="${build.dir}/lib"/> <copy file="${hadoop.root}/hadoop-core-${version}.jar" tofile="${build.dir}/lib/hadoop-core.jar" verbose="true"/><copy file="${hadoop.root}/lib/commons-cli-${commons-cli.version}.jar" tofile="${build.dir}/lib/commons-cli.jar" verbose="true"/><copy file="${hadoop.root}/lib/commons-configuration-1.6.jar" tofile="${build.dir}/lib/commons-configuration.jar" verbose="true"/><copy file="${hadoop.root}/lib/commons-httpclient-3.0.1.jar" tofile="${build.dir}/lib/commons-httpclient.jar" verbose="true"/><copy file="${hadoop.root}/lib/commons-lang-2.4.jar" tofile="${build.dir}/lib/commons-lang.jar" verbose="true"/><copy file="${hadoop.root}/lib/jackson-core-asl-1.8.8.jar" tofile="${build.dir}/lib/jackson-core-asl.jar" verbose="true"/><copy file="${hadoop.root}/lib/jackson-mapper-asl-1.8.8.jar" tofile="${build.dir}/lib/jackson-mapper-asl.jar" verbose="true"/> <jar jarfile="${build.dir}/hadoop-${name}-${version}.jar" manifest="${root}/META-INF/MANIFEST.MF"> <fileset dir="${build.dir}" includes="classes/ lib/"/> <fileset dir="${root}" includes="resources/ plugin.xml"/> </jar> </target></project>
2. 修改build-contrib.xml
cd /hadoop-1.2.1/src/contribvi build-contrib.xml<property name="hadoop.root" location="/root/gy/hadoop-1.2.1"/><property name="eclipse.home" location="/root/gy/springsource" /><property name="javac.deprecation" value="on"/>
3 修改MANIFEST.MF
Bundle-ClassPath: classes/,lib/commons-cli.jar,lib/commons-httpclient.jar,lib/hadoop-core.jar,lib/jackson-mapper-asl.jar,lib/commons-configuration.jar,lib/commons-lang.jar,lib/jackson-core-asl.jar
4 使用shell命令进入/root/hadoop-1.2.0/src/contrib/eclipse-plugin下面,执行ant命令进行构建
三、在eclipse当中构建hadoop项目
1.将ant生成的hadoop-1.2.1-eclipse-plugin.jar复制到 eclipse安装目录/plugins/ 下。
2. 重启eclipse,配置hadoop installation directory。
如果安装插件成功,打开Window-->Preferens,你会发现Hadoop Map/Reduce选项,在这个选项里你需要配置Hadoop installation directory。配置完成后退出。
3. 配置Map/Reduce Locations。
在Window-->Show View中打开Map/Reduce Locations。
在Map/Reduce Locations中新建一个Hadoop Location。在这个View中,右键-->New Hadoop Location。在弹出的对话框中你需要配置Location name,如Hadoop,还有Map/Reduce Master和DFS Master。这里面的Host、Port分别为你在mapred-site.xml、core-site.xml中配置的地址及端口。
4. 新建项目
File-->New-->Other-->Map/Reduce Project 项目名可以随便取,如WordCount。
5. WordCount源码
package com.test.word;import java.io.IOException;import java.util.*;import org.apache.hadoop.fs.Path;import org.apache.hadoop.conf.*;import org.apache.hadoop.io.*;import org.apache.hadoop.mapred.*;public class WordCount {public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1); private Text word = new Text();@Overridepublic void map(LongWritable arg0, Text value,OutputCollector<Text, IntWritable> output, Reporter arg3)throws IOException {String line = value.toString();StringTokenizer tokenizer = new StringTokenizer(line);while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); }} }public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {@Overridepublic void reduce(Text key, Iterator<IntWritable> values,OutputCollector<Text, IntWritable> output, Reporter arg3)throws IOException {int sum = 0;while (values.hasNext()) {sum += values.next().get();} output.collect(key, new IntWritable(sum));} }public static void main(String[] args) throws Exception {JobConf conf = new JobConf(WordCount.class); conf.setJarByClass(com.test.word.WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path("/root/gy/testData/input.txt")); FileOutputFormat.setOutputPath(conf, new Path("/root/gy/testData/output")); JobClient.runJob(conf);}}6 最终运行成功,后台信息。
这个是我ant后生成hadoop-eclipse-plugin jar包,可直接使用。这个jar的地址是: http://download.csdn.net/detail/luoluowushengmimi/6869717
- HADOOP 分布式集群环境下第一个mapReduce程序—WordCount
- hadoop hdfs搭建 mapreduce环境搭建 wordcount程序简单注释
- Hadoop(4-2)-MapReduce程序案例-WordCount(Intellij Idea环境)
- Hadoop MapReduce WordCount程序编写
- 基于HBase Hadoop 分布式集群环境下的MapReduce程序开发
- 搭建Hadoop伪分布式环境,及运行wordcount程序总结
- Hadoop之MapReduce—Wordcount
- WordCount:Hadoop中MapReduce的HelloWorld程序
- Hadoop集群环境测试-WordCount.java-上篇
- Hadoop集群环境测试-WordCount.java-下篇
- MapReduce程序——wordCount
- hadoop 集群运行WordCount示例程序
- hadoop集群运行小程序wordCount记录
- Hadoop集群初步使用-编写wordcount程序
- Hadoop配置集群/分布式环境
- Hadoop分布式集群环境搭建
- hadoop学习之HDFS(2.1):linux下eclipse中配置hadoop-mapreduce开发环境并运行WordCount.java程序
- Hadoop之MapReduce—Wordcount扩展
- 二叉堆,d堆,左式堆
- ext2.0.2 设置textfield 显示隐藏
- Linux之NFS共享
- WPF“天狗食月”效果
- 从零开始学习cocoStudio(5)--骨骼动画使用方法
- HADOOP 分布式集群环境下第一个mapReduce程序—WordCount
- Android---播放gif
- 改代码——个人看法
- GroupingView 里groupTextTpl一些应用
- 科学家质疑当今商用量子计算机的性能
- 《STL源码剖析》学习--六大组件
- Aptana下Django1.6以后的项目模板结构改造
- Sublime Text2.0.2注册码
- mac终端命令大全介绍