hadoop伪分布式部署及测试

来源:互联网 发布:js 大屏幕 倒计时特效 编辑:程序博客网 时间:2024/05/22 07:50

以ubuntu10.04安装运行伪分布式Hadoop(以0.20.2版本为例)

下载Hadoop:地址:http://www.apache.org/dyn/closer.cgi/hadoop/common/ 选择一个镜像地址。选择版本。

操作都在hadoop的home目录下。

准备工作

1.   安装jdk

2.   解压所下载的Hadoop发行版。编辑 conf/hadoop-env.sh文件,至少需要将JAVA_HOME设置为Java安装根路径。

3.   安装ssh,打开终端,输入如下命令

sudo apt-get install openssh-server

以下操作为生成ssh公钥和私钥

cd ~

ssh-keygen –t rsa(需输入密码)

cd .ssh

cat id_rsa.pub > authorized_keys(该步为了免密登录)

 

若在配置途中出现关于ssh的以下错误

Agent admitted failure to sign using the key

执行以下命令

ssh-add ~/.ssh/id_rsa

便可实现无密登录了

注意:

若执行ssh-add是出现这个错误:Could not open a connection to yourauthentication agent,则先执行如下命令即可:

ssh-agent bash

 

伪分布式模式的操作方法

Hadoop可以在单节点上以所谓的伪分布式模式运行,此时每一个Hadoop守护进程都作为一个独立的Java进程运行。


配置使用如下的 

conf/core-site.xml:

<configuration>

 <property>

  <name>fs.default.name</name>

  <value>hdfs://192.168.0.101:9000</value>

 </property>

</configuration>

conf/hdfs-site.xml:

<configuration>
 <property>
  <name>fs.replication</name>
  <value>1</value>
 </property>
</configuration>

conf/mapred-site.xml:

<configuration>
 <property>
  <name>mapred.job.tracker</name>
  <value>192.168.0.101:9001</value>

 </property>

</configuration>


首先,请求 namenode 对 DFS 文件系统进行格式化。在安装过程中完成了这个步骤,但是了解是否需要生成干净的文件系统是有用的。

bin/hadoop namenode -format

输出:

11/11/3009:53:56 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host= ubuntu1/192.168.0.101

STARTUP_MSG:   args= [-format]

STARTUP_MSG:  version = 0.20.2

STARTUP_MSG:  build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707;compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

************************************************************/

11/11/3009:53:56 INFO namenode.FSNamesystem: fsOwner=root,root

11/11/3009:53:56 INFO namenode.FSNamesystem: supergroup=supergroup

11/11/3009:53:56 INFO namenode.FSNamesystem: isPermissionEnabled=true

11/11/3009:53:56 INFO common.Storage:Image file of size94 savedin0 seconds.

11/11/3009:53:57 INFO common.Storage:Storage directory/tmp/hadoop-root/dfs/name has beensuccessfully formatted.

11/11/3009:53:57 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode atubuntu1/192.168.0.101

************************************************************/


执行:bin/start-all.sh

输出:

starting namenode, logging to /usr/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-ubuntu1.out

localhost: starting datanode, logging to/usr/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-ubuntu1.out

localhost: startingsecondarynamenode, logging to/usr/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-ubuntu1.out

starting jobtracker, logging to/usr/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-ubuntu1.out

localhost: startingtasktracker, logging to/usr/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-ubuntu1.out

 

检查hdfs:

bin/hadoop fs -ls /

输出目录文件则正常。


hadoop文件系统操作:

bin/hadoop fs -mkdir test

bin/hadoop fs -ls test

bin/hadoop fs -rmr test


测试hadoop:

bin/hadoop fs -mkdir input

自己建立两个文本文件:file1和file2放在/opt/hadoop/sourcedata下

执行:bin/hadoop  fs  -put  /opt/hadoop/sourcedata/file*  input

执行:bin/hadoop  jar  hadoop-0.20.2-examples.jar  wordcount  input  output

输出:

11/11/3010:15:38 INFO input.FileInputFormat:Total input paths toprocess:2

11/11/3010:15:52 INFO mapred.JobClient:Running job: job_201111301005_0001

11/11/3010:15:53 INFO mapred.JobClient:  map0% reduce0%

11/11/3010:19:07 INFO mapred.JobClient:  map50% reduce0%

11/11/3010:19:14 INFO mapred.JobClient:  map100% reduce0%

11/11/3010:19:46 INFO mapred.JobClient:  map100% reduce100%

11/11/3010:19:54 INFO mapred.JobClient:Job complete: job_201111301005_0001

11/11/3010:19:59 INFO mapred.JobClient:Counters:17

11/11/3010:19:59 INFO mapred.JobClient:  JobCounters

11/11/3010:19:59 INFO mapred.JobClient:    Launched reduce tasks=1

11/11/3010:19:59 INFO mapred.JobClient:    Launched map tasks=2

11/11/3010:19:59 INFO mapred.JobClient:    Data-local map tasks=2

11/11/3010:19:59 INFO mapred.JobClient:  FileSystemCounters

11/11/3010:19:59 INFO mapred.JobClient:    FILE_BYTES_READ=146

11/11/3010:19:59 INFO mapred.JobClient:    HDFS_BYTES_READ=64

11/11/3010:19:59 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=362

11/11/3010:19:59 INFO mapred.JobClient:    HDFS_BYTES_WRITTEN=60

11/11/3010:19:59 INFO mapred.JobClient:  Map-ReduceFramework

11/11/3010:19:59 INFO mapred.JobClient:    Reduce input groups=9

11/11/3010:19:59 INFO mapred.JobClient:    Combine output records=13

11/11/3010:19:59 INFO mapred.JobClient:    Map input records=2

11/11/3010:19:59 INFO mapred.JobClient:    Reduce shuffle bytes=102

11/11/3010:19:59 INFO mapred.JobClient:    Reduce output records=9

11/11/3010:19:59 INFO mapred.JobClient:    SpilledRecords=26

11/11/3010:19:59 INFO mapred.JobClient:    Map output bytes=120

11/11/3010:19:59 INFO mapred.JobClient:    Combine input records=14

11/11/3010:19:59 INFO mapred.JobClient:    Map output records=14

11/11/3010:19:59 INFO mapred.JobClient:    Reduce input records=13

执行成功!

其他查看结果命令:

bin/hadoop fs -ls /user/root/output

bin/hadoop fs -cat output/part-r-00000

bin/hadoop fs-cat output/part-r-00000| head -13

bin/hadoop fs-get output/part-r-00000 output.txt
cat output.txt | head-5

bin/hadoop fs-rmr output

也可以使用浏览器查看,地址:

http://192.168.0.101:50030 (mapreduce的web页面)
http://192.168.0.101:50070 (hdfs的web页面)

下面执行grep的mapreduce任务:

执行:bin/hadoop fs -rmr output

执行:bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'hadoop'

输出:

11/11/3010:28:37 INFO mapred.FileInputFormat:Total input paths toprocess:2

11/11/3010:28:40 INFO mapred.JobClient:Running job: job_201111301005_0002

11/11/3010:28:41 INFO mapred.JobClient:  map0% reduce0%

11/11/3010:34:16 INFO mapred.JobClient:  map66% reduce0%

11/11/3010:37:40 INFO mapred.JobClient:  map100% reduce11%

11/11/3010:37:50 INFO mapred.JobClient:  map100% reduce22%

11/11/3010:37:54 INFO mapred.JobClient:  map100% reduce66%

11/11/3010:38:15 INFO mapred.JobClient:  map100% reduce100%

11/11/3010:38:30 INFO mapred.JobClient:Job complete: job_201111301005_0002

11/11/3010:38:32 INFO mapred.JobClient:Counters:18

11/11/3010:38:32 INFO mapred.JobClient:  JobCounters

11/11/3010:38:32 INFO mapred.JobClient:    Launched reduce tasks=1

11/11/3010:38:32 INFO mapred.JobClient:    Launched map tasks=3

11/11/3010:38:32 INFO mapred.JobClient:    Data-local map tasks=3

11/11/3010:38:32 INFO mapred.JobClient:  FileSystemCounters

11/11/3010:38:32 INFO mapred.JobClient:    FILE_BYTES_READ=40

11/11/3010:38:32 INFO mapred.JobClient:    HDFS_BYTES_READ=77

11/11/3010:38:32 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=188

11/11/3010:38:32 INFO mapred.JobClient:    HDFS_BYTES_WRITTEN=109

11/11/3010:38:32 INFO mapred.JobClient:  Map-ReduceFramework

11/11/3010:38:32 INFO mapred.JobClient:    Reduce input groups=1

11/11/3010:38:32 INFO mapred.JobClient:    Combine output records=2

11/11/3010:38:32 INFO mapred.JobClient:    Map input records=2

11/11/3010:38:32 INFO mapred.JobClient:    Reduce shuffle bytes=46

11/11/3010:38:32 INFO mapred.JobClient:    Reduce output records=1

11/11/3010:38:32 INFO mapred.JobClient:    SpilledRecords=4

11/11/3010:38:32 INFO mapred.JobClient:    Map output bytes=30

11/11/3010:38:32 INFO mapred.JobClient:    Map input bytes=64

11/11/3010:38:32 INFO mapred.JobClient:    Combine input records=2

11/11/3010:38:32 INFO mapred.JobClient:    Map output records=2

11/11/3010:38:32 INFO mapred.JobClient:    Reduce input records=2

11/11/3010:38:36 WARN mapred.JobClient:UseGenericOptionsParserfor parsing the arguments.Applications should implementToolfor the same.

执行:bin/hadoop fs -catoutput/part-00000

输出:2hadoop


转载:http://m.blog.csdn.net/blog/rjhym/8269977

并加以修正补充

0 0