Hadoop 0.20.0+RHEL 5+Eclipse plugin+Makefile开发环境搭建
来源:互联网 发布:sqlserver企业版价格 编辑:程序博客网 时间:2024/06/05 11:41
Hadoop框架已经将MapReduce的运行机制封装好,程序员在实际开发过程中,只需要将精力专注于各自的业务逻辑,而不必花费过多的时间和精力来考虑具体怎么调度和执行,因为MapReduce框架已经将这些做好了,这样降低了MapReduce程序开发的难度,具有很好的易用性。
这里通过在Linux系统下,搭建基于Hadoop + Eclipse plugin的开发环境,并在实际中使用。下面详述具体的配置和实践情况。
RHEL 5环境配置
(1)无密码验证配置
执行如下命令:
[shirdrn@localhost .ssh]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa[shirdrn@localhost .ssh]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys[shirdrn@localhost .ssh]$ ssh localhost如果可以不需要密码,ssh到localhost,表示配置成功。
否则,如果出现需要输入密码进入,则需要检查一下你的.ssh目录的权限,包括.ssh目录下的authorized_keys和known_hosts是否具有读(r)的权限,如果问题出在这里,授权即可。
(2)环境变量配置
修改.bashrc文件,配置环境变量:
[shirdrn@localhost ~]$ vi .bashrc
# .bashrc# Source global definitionsif [ -f /etc/bashrc ]; then . /etc/bashrcfi# User specific aliases and functionsexport JAVA_HOME=/usr/java/jdk1.6.0_16export PATH=$JAVA_HOME/bin:$PATHexport CLASSPATH=.:$JAVA_HOME/jre/lib/*.jar:$JAVA_HOME/jre/lib/*.jarexport HADOOP_HOME=/home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0export PATH=$PATH:$HADOOP_HOME/bin
Hadoop准备
1、下载hadoop-0.20.0.tar.gz压缩包,解压缩到/home/shirdrn/eclipse/eclipse-3.5.2/hadoop/目录下面:
[shirdrn@localhost hadoop]$ tar -xzvf hadoop-0.20.0.tar.gz
2、配置Hadoop
(1)修改hadoop-0.20.0/conf/hadoop-env.sh,在该文件中增加如下三行:
export JAVA_HOME=/usr/java/jdk1.6.0_16 export HADOOP_HOME=/home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0 export PATH=$PATH:$HADOOP_HOME/bin
(2)修改hadoop-0.20.0/conf/core-site.xml,内容如下所示:
(3)修改hadoop-0.20.0/conf/hdfs-site.xml,内容如下所示:
(4)修改hadoop-0.20.0/conf/mapred-site.xml,内容如下所示:
Eclipse plugin配置
这个配置比较容易一点:
解压缩eclipse-SDK-3.5.2-linux-gtk.tar.gz到/home/shirdrn/eclipse/目录下面,然后将hadoop-0.20.0/contrib/eclipse-plugin/hadoop-0.20.0-eclipse-plugin.jar插件拷贝到/home/shirdrn/eclipse/eclipse-3.5.2/eclipse/plugins/目录下面:
[shirdrn@localhost ~]$ cp /home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0/contrib/eclipse-plugin/hadoop-0.20.0-eclipse-plugin.jar /home/shirdrn/eclipse/eclipse-3.5.2/eclipse/plugins/
然后,就可以在RHEL 5下启动Eclipse开发工具了。
测试实践
1、通过Shell命令行,启动Hadoop
(1)格式化HDFS
[shirdrn@localhost hadoop-0.20.0]$ bin/hadoop namenode -format
格式化结果:
10/10/08 08:21:28 INFO namenode.NameNode: STARTUP_MSG: /************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = localhost/127.0.0.1STARTUP_MSG: args = [-format]STARTUP_MSG: version = 0.20.0STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r 763504; compiled by 'ndaley' on Thu Apr 9 05:18:40 UTC 2009************************************************************/10/10/08 08:21:28 INFO namenode.FSNamesystem: fsOwner=shirdrn,shirdrn10/10/08 08:21:28 INFO namenode.FSNamesystem: supergroup=supergroup10/10/08 08:21:28 INFO namenode.FSNamesystem: isPermissionEnabled=true10/10/08 08:21:28 INFO common.Storage: Image file of size 97 saved in 0 seconds.10/10/08 08:21:28 INFO common.Storage: Storage directory /tmp/hadoop/hadoop-shirdrn/dfs/name has been successfully formatted.10/10/08 08:21:28 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1************************************************************/
(2)启动Hadoop后台线程
[shirdrn@localhost hadoop-0.20.0]$ bin/start-all.sh
执行结果:
starting namenode, logging to /home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0/logs/hadoop-shirdrn-namenode-localhost.outlocalhost: starting datanode, logging to /home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0/logs/hadoop-shirdrn-datanode-localhost.outlocalhost: starting secondarynamenode, logging to /home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0/logs/hadoop-shirdrn-secondarynamenode-localhost.outstarting jobtracker, logging to /home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0/logs/hadoop-shirdrn-jobtracker-localhost.outlocalhost: starting tasktracker, logging to /home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0/logs/hadoop-shirdrn-tasktracker-localhost.out
(3)查看确认进程是否全部启动
[shirdrn@localhost hadoop-0.20.0]$ jps
8100 DataNode
8398 TaskTracker
8230 SecondaryNameNode
7994 NameNode
8301 JobTracker
8459 Jps
可见,正常启动。
2、准备测试数据
上传测试数据,执行命令:
[shirdrn@localhost hadoop-0.20.0]$ bin/hadoop fs -put conf/ input
如果没有报错,说明上传成功。
可以通过如下命令进行验证:
[shirdrn@localhost hadoop-0.20.0]$ bin/hadoop fs -ls /user/shirdrn/inputFound 13 items-rw-r--r-- 1 shirdrn supergroup 6275 2010-10-08 08:24 /user/shirdrn/input/capacity-scheduler.xml-rw-r--r-- 1 shirdrn supergroup 535 2010-10-08 08:24 /user/shirdrn/input/configuration.xsl-rw-r--r-- 1 shirdrn supergroup 388 2010-10-08 08:24 /user/shirdrn/input/core-site.xml-rw-r--r-- 1 shirdrn supergroup 2396 2010-10-08 08:24 /user/shirdrn/input/hadoop-env.sh-rw-r--r-- 1 shirdrn supergroup 1245 2010-10-08 08:24 /user/shirdrn/input/hadoop-metrics.properties-rw-r--r-- 1 shirdrn supergroup 4190 2010-10-08 08:24 /user/shirdrn/input/hadoop-policy.xml-rw-r--r-- 1 shirdrn supergroup 259 2010-10-08 08:24 /user/shirdrn/input/hdfs-site.xml-rw-r--r-- 1 shirdrn supergroup 2815 2010-10-08 08:24 /user/shirdrn/input/log4j.properties-rw-r--r-- 1 shirdrn supergroup 275 2010-10-08 08:24 /user/shirdrn/input/mapred-site.xml-rw-r--r-- 1 shirdrn supergroup 10 2010-10-08 08:24 /user/shirdrn/input/masters-rw-r--r-- 1 shirdrn supergroup 10 2010-10-08 08:24 /user/shirdrn/input/slaves-rw-r--r-- 1 shirdrn supergroup 1243 2010-10-08 08:24 /user/shirdrn/input/ssl-client.xml.example-rw-r--r-- 1 shirdrn supergroup 1195 2010-10-08 08:24 /user/shirdrn/input/ssl-server.xml.example
3、在Eclipse上进行开发
(1)启动Eclipse 3.5.2,设置工作目录为/home/shirdrn/eclipse/eclipse-3.5.2/workspace。
这时,切换到Open Perspective,可以看到Map/Reduce视图,切换到这个视图,可以看到Eclipse IDE左侧的Project Explorer中出现了DFS Locations。在后面的实践中,我们创建一个Map/Reduce项目的时候,就会看到DFS Locations会显示当前相关的HDFS上的资源目录情况。
(2)创建并配置Map/Reduce项目
创建一个Map/Reduce项目,名称为hadoop,这时,需要在该选项卡上看到“Configure Hadoop install directory...”链接,打开,配置内容为,我们前面指定的$HADOOP_HOME的目录,即为/home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0;
点击Next,直到Finish,可以看到Eclipse IDE左侧的Project Explorer中的项目,展开后,可以看到除了src资源文文件以外,还有很多Hadoop相关的jar文件;
选中该项目hadoop,在包org.shirdrn.hadoop中分别创建Hadoop发行包中自带的、经过我们分解的WordCount例子的源代码,如下所示:
Mapper类为TokenizerMapper.java,如下所示:
Reducer类为IntSumReducer.java,如下所示:
MapReduce Driver类为WordCount.java,如下所示:
4、使用Makefile进行打包运行
由于该版本的Eclipse plugin不能直接Run On Hadoop,解决的办法就是通过实现自己的Makefile来进行单独操作,实现Hadoop程序的运行。
针对上面我们使用的例子,编写如下Makefile:
JarFile="WordCount-V0.1.jar"MainFunc="org.shirdrn.hadoop.WordCount"LocalOutDir="/tmp/output"all:helpjar: jar -cvf ${JarFile} -C bin/ .run: hadoop jar ${JarFile} ${MainFunc} input outputclean: hadoop fs -rmr outputoutput: rm -rf ${LocalOutDir} hadoop fs -get output ${LocalOutDir} cat ${LocalOutDir}/part-r-00000help: @echo "Usage:" @echo " make jar - Build Jar File." @echo " make clean - Clean up Output directory on HDFS." @echo " make run - Run your MapReduce code on Hadoop." @echo " make output - Download and show output file" @echo " make help - Show Makefile options." @echo " " @echo "Example:" @echo " make jar; make run; make output; make clean"
(1)打包Jar文件
[shirdrn@localhost hadoop]$ make jarjar -cvf "WordCount-V0.1.jar" -C bin/ .added manifestadding: org/(in = 0) (out= 0)(stored 0%)adding: org/shirdrn/(in = 0) (out= 0)(stored 0%)adding: org/shirdrn/hadoop/(in = 0) (out= 0)(stored 0%)adding: org/shirdrn/hadoop/IntSumReducer.class(in = 2320) (out= 901)(deflated 61%)adding: org/shirdrn/hadoop/WordCount.class(in = 2022) (out= 1066)(deflated 47%)adding: org/shirdrn/hadoop/TokenizerMapper.class(in = 2232) (out= 887)(deflated 60%)
(2)运行程序
[shirdrn@localhost hadoop]$ make runhadoop jar "WordCount-V0.1.jar" "org.shirdrn.hadoop.WordCount" input output10/10/08 08:46:54 INFO input.FileInputFormat: Total input paths to process : 1310/10/08 08:46:55 INFO mapred.JobClient: Running job: job_201010080822_000110/10/08 08:46:56 INFO mapred.JobClient: map 0% reduce 0%10/10/08 08:47:40 INFO mapred.JobClient: map 15% reduce 0%10/10/08 08:47:59 INFO mapred.JobClient: map 30% reduce 0%10/10/08 08:48:18 INFO mapred.JobClient: map 46% reduce 10%10/10/08 08:48:24 INFO mapred.JobClient: map 61% reduce 15%10/10/08 08:48:30 INFO mapred.JobClient: map 76% reduce 15%10/10/08 08:48:33 INFO mapred.JobClient: map 76% reduce 20%10/10/08 08:48:36 INFO mapred.JobClient: map 92% reduce 20%10/10/08 08:48:44 INFO mapred.JobClient: map 100% reduce 25%10/10/08 08:48:47 INFO mapred.JobClient: map 100% reduce 30%10/10/08 08:48:55 INFO mapred.JobClient: map 100% reduce 100%10/10/08 08:48:58 INFO mapred.JobClient: Job complete: job_201010080822_000110/10/08 08:48:58 INFO mapred.JobClient: Counters: 1710/10/08 08:48:58 INFO mapred.JobClient: Job Counters 10/10/08 08:48:58 INFO mapred.JobClient: Launched reduce tasks=110/10/08 08:48:58 INFO mapred.JobClient: Launched map tasks=1310/10/08 08:48:58 INFO mapred.JobClient: Data-local map tasks=1310/10/08 08:48:58 INFO mapred.JobClient: FileSystemCounters10/10/08 08:48:58 INFO mapred.JobClient: FILE_BYTES_READ=1710810/10/08 08:48:58 INFO mapred.JobClient: HDFS_BYTES_READ=2083610/10/08 08:48:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3470410/10/08 08:48:58 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1180710/10/08 08:48:58 INFO mapred.JobClient: Map-Reduce Framework10/10/08 08:48:58 INFO mapred.JobClient: Reduce input groups=010/10/08 08:48:58 INFO mapred.JobClient: Combine output records=83210/10/08 08:48:58 INFO mapred.JobClient: Map input records=62410/10/08 08:48:58 INFO mapred.JobClient: Reduce shuffle bytes=1718010/10/08 08:48:58 INFO mapred.JobClient: Reduce output records=010/10/08 08:48:58 INFO mapred.JobClient: Spilled Records=166410/10/08 08:48:58 INFO mapred.JobClient: Map output bytes=2772810/10/08 08:48:58 INFO mapred.JobClient: Combine input records=201010/10/08 08:48:58 INFO mapred.JobClient: Map output records=201010/10/08 08:48:58 INFO mapred.JobClient: Reduce input records=832
(3)查看结果
[shirdrn@localhost hadoop]$ make outputversion="1.0"> 1version="1.0"?> 8via 2virtual 3want 1when 1where 2where, 1whether 1which 8who 1will 8with 5worker 1would 5xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 1
上述截取了部分结果。
参考文章
本文是按照下面作者给出实现方式,进行学习和实践的,作为个人学习记录,非常感谢。
1. 《hadoop 0.20 程式開發》 http://trac.nchc.org.tw/cloud/wiki/waue/2009/0617
- Hadoop 0.20.0+RHEL 5+Eclipse plugin+Makefile开发环境搭建
- Eclipse搭建hadoop开发环境[hadoop-eclipse-plugin-2.5.2]
- Eclipse3.3+hadoop-0.20.0+hadoop-0.20.0-eclipse-plugin环境成功搭建
- 搭建Hadoop2.7.2开发环境1(编译Eclipse中hadoop插件hadoop2x-eclipse-plugin)
- eclipse搭建hadoop 0.20.2开发环境.
- eclipse搭建hadoop开发环境
- eclipse搭建hadoop开发环境
- Eclipse搭建hadoop开发环境
- hadoop+eclipse开发环境搭建
- hadoop eclipse 开发环境搭建
- MyEclipse安装hadoop-eclipse-plugin,配置本地hadoop开发环境
- eclipse搭建GBA开发环境(makefile)
- eclipse+makefile开发hadoop
- Eclipse 下搭建Hadoop(2.5.0) 开发环境 YARN
- win7+Cygwin+Eclipse搭建Hadoop开发环境
- windows下搭建hadoop开发环境(Eclipse)
- hadoop搭建与eclipse开发环境设置
- hadoop搭建与eclipse开发环境设置
- 第02章 IntelliJ IDEA起步 熟悉IntelliJ IDEA编辑器 10 增加文档到收藏夹
- 针对职工工资的发放,给出各种标额最少的张数的付款方案
- 相关重做的等待事件
- 关于龙书第13章地形绘制的terrain项目运行出错问题(代码已修改,可正常使用)
- 拖拽及覆盖效果
- Hadoop 0.20.0+RHEL 5+Eclipse plugin+Makefile开发环境搭建
- UML的新认识
- 用Eclipse开发Symbian S60 J2ME程序
- javascript static样式
- 学习的共通性
- 通用光照公式,光照处理
- jforum数据库字典
- 人生又一搏
- 第4课 Flex应用程序结构概述