Installation of Hadoop-1.2.1 Pseudo-distributed mode on Centos 7

来源：互联网发布：玉兔公子淘宝编辑：程序博客网时间：2024/05/21 05:22

1 Hadoop Versions

On the official website of Apache, there are variable hadoop releases from 0.10.1 to 2.7.2(recent). In compared with the early releases, the 2.x hadoop introduced Global Resource Manager and Application Master, which are the core components of the so-called YARN framework. Beside of MapReduce, a large number of other parallel computing models such as Memory-Computing, Streaming-Computing, Iterative-Computing, Graph-Computing can be compatible in the new hadoop system. Meantime, according to distinct platforms, Apache offers corresponding packages(.rpm/.deb) or compression files(-bin.tar.gz/tar.gz). So how to choose a suitable package according to your OS is really a trick.

At first, I choose a .rpm hadoop for my centos7. Question happened when I try to use the rpm package management tool to set up the hadoop:

rpm -ivh ./hadoop***.rpm

The error message shows that the default set-up directory in hadoop/bin conflicts with the system root directory /bin. So you have to use other arguments(relocate or prefix) to denote the appointed installation directory:

rpm -ivh --relocate /=/opt/temp xxx.rpm； or rpm -ivh --prefix= /opt/temp xxx.rpm

On the contrary, it is more convenient to handle with the .tar.gz version of hadoop. Just use the tar tool and then copy it to arbitrary rational position.

2 Prerequisites for Installation

It is suggested that you create a new linux user for hadoop, and assign the new user a higher permission by modifying the sudoers file in /etc directory. Remember to recover the file's read-only attribute after your finished:

1)Create a new user for hadoop

groupadd hadoop-user ----- create a user group

useradd -g hadoop-user hadoop ----- add up the new user hadoop to the group

passwd hadoop ----- set up a password for your new user.

2)Modify permission for the new user

Switch to the root mode, then add the writing permission of the /etc/sudoers to the logged-in new user.

#chmod u+w /etc/sudoers

emend the sudoers file, add up a new line:

user ALL(ALL) NOPASSWD: ALL or user ALL(ALL) ALL

At last, recover the sudoers file to the read-only mode.

#chmod u-w /etc/sudoers

Since hadoop already runs in java, you need to have a java version 1.6 or higher on your machine. Fortunately, centos contains openjdk 1.8 for the recent upgrade. You can also choose another official version of java from Oracle. If you choose a .rpm package for java, it needs not to set up the environment variable. If not, add the JAVA_HOME to your profile configuration, the process is omitted...

The communication between nodes in the cluster happens via ssh. In a multi-node cluster setup of communication between individual nodes, while in single-node cluster, localhost acts as server. The concrete configuration as below:

$ssh-keygen -t rsa ------ generation of keys pair

$cp id_rsa.pub authorized_keys ------- copy the public key to the authorized user

$ssh localhost ------ test the password-less connection

If the connection should fail, these general tips might help:

Enable debugging with ssh -vvv localhost and investigate the error in detail.

Check the SSH server configuration in /etc/ssh/sshd_config, in particular the options PubkeyAuthentication and AllowUsers. If you made any changes to the SSH server configuration file, you can force a configuration reload with sudo /etc/init.d/ssh reload.

3 Formal deployment

It is not recommend to add the HADOOP_HOME to the environment variable for the reason that it is deprecated. you need several steps to finish your work:

Step1: Configuring the Hadoop environment

Just append the correlative contents to the respective four files in hadoop-**/conf:

1)hadoop-env.sh

$export JAVA_HOME="Where your JAVA HOME"

2)core-site.xml

3)hdfs-site.xml

4)mapred-site.xml

Step2: Running Hadoop

1)Formatting the NameNode

$bin/hadoop namenode -format

2)Starting Hadoop

You can do a two-stage start up to more easily verify the cluster configuration or just start-all.

$bin/start-dfs.sh

$bin/start-mapred.sh

or:

$bin/start-all.sh

3)Checking the started hadoop process

Normally, if you operated right, by using the 'jps' command, you would find totally five hadoop processes except the jps process itself.

There are several tips if you have some of the processes failed to start:

a. Checking out the four configuration file in the hadoop installation directory hadoop-***/conf, make the directory you denoted for 'tmp' or 'namenode', 'datanode' existing.

b. Censoring if you have the permission to manage the denoted directories above.

c. Through the hadoop web UI to find relative error log.

Step3: Testing instance

Here, we use the hadoop owned example, which aims to compute PI, the first parameter rules the running times of map, the second one denotes the number of samples each map task needs to fetch. The command and map-reduce running procedure are listed below:

$bin/hadoop jar $HADOOP_HOME/hadoop-examples-1.2.1.jar \

>pi 2 5

0 0