hadoop安装测试

来源：互联网发布：厦门理工软件学院编辑：程序博客网时间：2024/04/30 18:43

操作系统

操作系统使用Ubuntu 11.04桌面版。设置root密码：sudo passwd root

下载jdk,jdk-6u27-linux-i586.bin

下载hadoop, hadoop-0.20.204.0.tar.gz

安装JDK

安装java6,将JDK复制到/usr/local目录下，使用下面命令安装

sudo sh jdk-6u27-linux-i586.bin

设置JDK环境变量

sudo gedit /etc/environment

增加PATH、增加export JAVA_HOME、增加export CLASSPATH

创建用户组及用户

创建用户组hadoop，创建用户hadoop

sudo addgroup hadoop

sudo adduser --ingroup hadoop hadoop

配置SSH

下载SSH SERVER:sudo apt-get install openssh-server

生成SSH证书：

1.转换到hadoop用户下：su hadoop

2.生成空密码的SSH证书：ssh-keygen -t rsa -P ""

要求输入文件名时，直接回车,会生成.ssh文件

完成后，测试：ssh localhost

安装hadoop

复制hadoop安装文件到/usr/local

sudo cp hadoop-0.20.204.0.tar.gz /usr/local

解压hadoop文件

sudo tar xzf hadoop-0.20.204.0.tar.gz

解压后生成目录hadoop-0.20.204.0，更改目录名称为hadoop，方便使用

sudo mv hadoop-0.20.204.0 hadoop

给目录hadoop增加执行权限

sudo chown -R hadoop:hadoop hadoop

配置hadoop

打开hadoop/conf/core-site.xml

sudo gedit core-site.xml

增加以下内容:

1.增加临时内容存放目录，最好建在hadoop用户下，如果在/usr/local下，执行hadoop时，会没有权限建立临时目录,

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/hadoop-datastore</value>

<description>A base for other temporary directories.</description>

</property>

2.增加namenode节点

<name>fs.default.name</name>

<value>hdfs:localhost:54310</value>

<description>The name of the default file system. A URI whose

scheme and authority determine the FileSystem implementation. The

uri's scheme determines the config property (fs.SCHEME.impl) naming

the FileSystem implementation class. The uri's authority is used to

determine the host, port, etc. for a filesystem.</description>

</property>

打开mapred-site.xml,增加MapReduce job tracker运行的主机和端口

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map

and reduce task.

</description>

</property>

打开hdfs-site.xml

<name>dfs.replication</name>

<description>Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.

</description>

</property>

格式化命名节点

bin/hadoop namenode -format

启动hdfs和MapReduce：bin/start-all.sh

停止服务:bin/stop-all.sh

使用jps命令查看运行的hadoop进程

查看集群状态命令:bin/hadoop dfsadmin -report

使用web方式查看：

1.hdfs的WEB页面：http://localhost:50070

2.MapReduce的WEB页面：http://localhost:50030

测试hadoop

使用hadoop所附带的例程，测试文件中的单词重复数

可以在/home/hadoop中建立input目录，创建两个文本文档，内容是不同的单词，文件名为file01.txt,file02.txt

如果已经在hadoop中执行过程序并创建了input和output目录，在执行前先删除，否则会失败。

在hadoop中建立input目录 bin/hadoop fs -mkdir input

将本地的两个文件上传到hadoop中，bin/hadoop fs -copyFromLocal /home/hadoop/input/file0*.txt input

执行例程wordcount, bin/hadoop jar hadoop-examples-0.20.204.0.jar wordcount input output

执行完成后，可以查看下hadoop中output目录：bin/hadoop fs -ls output，可以看到有三个文档，_SUCCESS, _logs和part-r-00000

说明例程执行成功了。查看执行的结果：bin/hadoop fs -cat output/part-r-00000