hadoop单节点安装(伪分布式)

来源：互联网发布：阿里云公司编辑：程序博客网时间：2024/06/05 20:29

一、环境准备

1、操作系统：redhat linux

2、jdk：1.6.0

3、hadoop：1.2.1

二、安装

1、在linux上创建针对hadoop的用户和组。

// 添加组groupadd hadoop//添加用户useradd -g hadoop hduser

2、配置SSH

以hduser用户进行操作。

ssh-keygen -t rsa -P ""

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

3、将hadoop的tar包解压到/usr/local/

tar -zxvf /opt/hadoop/hadoop-1.2.1.tar.gz -C /usr/local/mv hadoop-1.2.1 hadoop

在执行以上操作时，要注意使用root用户，因为hduser无法操作/usr/local目录。

4、将解压的目录/usr/local/hadoop更改所有者属性

chown -R hduser:hadoop hadoop

5、配置hduser用户的~/.bash_profile或者~/.bashrc文件

export JAVA_HOME=/usr/java/jdk1.6.0_41PATH=$JAVA_HOME/bin:$PATHexport PATH

另外，配置hadoop目录下的配置文件$HADOOP_HOME/conf/hadoop-env.sh，将下面的“#”去掉。

# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

6、修改hadoop的配置文件

conf/core-site.xml:

<configuration><property>  <name>hadoop.tmp.dir</name>  <value>/opt/hadoop/tmp</value>  <description>A base for other temporary directories.</description></property><property>  <name>fs.default.name</name>  <value>hdfs://localhost:54310</value>  <description>The name of the default file system.  A URI whose  scheme and authority determine the FileSystem implementation.  The  uri's scheme determines the config property (fs.SCHEME.impl) naming  the FileSystem implementation class.  The uri's authority is used to  determine the host, port, etc. for a filesystem.</description></property></configuration>

　　注意：/opt/hadoop/tmp目录需手工创建。

conf/hdfs-site.xml:

<configuration><property>  <name>dfs.replication</name>  <value>1</value>  <description>Default block replication.  The actual number of replications can be specified when the file is created.  The default is used if replication is not specified in create time.  </description></property><property>    <name>dfs.permissions</name>    <value>false</value></property></configuration>

conf/mapred-site.xml:

<configuration><property>  <name>mapred.job.tracker</name>  <value>localhost:54311</value>  <description>The host and port that the MapReduce job tracker runs  at.  If "local", then jobs are run in-process as a single map  and reduce task.  </description></property></configuration>

7、格式化文件系统

配置完成后，需要格式化HDFS文件系统。进入hadoop主目录下的bin目录：

./hadoop namenode -format

8、格式完后，启动hadoop

./start-all.sh

9、启动完毕后，可通过jps命令判断服务是否启动

[hduser@localhost bin]$ jps1442 TaskTracker1304 JobTracker2146 Jps1216 SecondaryNameNode933 NameNode1063 DataNode

如果看到这六个，说明启动成功了。

10、进行测试，先将本地文件拷贝到hdfs上。

./hadoop fs -put /opt/hadoop/sampledata /opt/hadoop/input

其中，sampledata目录是本地的目录，里面有测试数据文件。 input目录是hdfs上的目录，暂时没有创建，导入文件时会自动创建。

[hduser@localhost bin]$ ./hadoop dfs -ls /opt/hadoop/inputWarning: $HADOOP_HOME is deprecated.Found 3 items-rw-r--r--   1 hduser supergroup     674570 2013-12-05 10:29 /opt/hadoop/input/pg20417.txt-rw-r--r--   1 hduser supergroup    1573150 2013-12-05 10:29 /opt/hadoop/input/pg4300.txt-rw-r--r--   1 hduser supergroup    1423803 2013-12-05 10:29 /opt/hadoop/input/pg5000.txt

如果看到三个文件，说明上传成功。此三个文件是我的测试文件。

11、最后，运行测试程序

./hadoop jar /usr/local/hadoop/hadoop-examples-1.2.1.jar wordcount /opt/hadoop/input /opt/hadoop/output

如果这步顺利的话，说明环境搭成功了。

备注：

如果在安装linux系统时，更改过主机名，有可能出现如下异常：

Error getting localhost namejava.net.UnknownHostException: ubuntu: ubuntu: Name or service not known

其中，ubuntu:ubuntu为对应的主机名。

这是因为配置文件中使用的是localhost，无法解析。

将/etc/hosts中，127.0.0.1对应的别名增加本机对应的主机名即可。