Installation of Hadoop-1.2.1 Pseudo-distributed mode on Centos 7

来源:互联网 发布:玉兔公子淘宝 编辑:程序博客网 时间:2024/05/21 05:22

1 Hadoop Versions

   On the official website of Apache, there are variable hadoop releases from 0.10.1 to 2.7.2(recent). In compared with the early releases, the 2.x hadoop introduced Global Resource Manager and Application Master, which are the core components of the so-called YARN framework. Beside of MapReduce, a large number of other parallel computing models such as Memory-Computing, Streaming-Computing, Iterative-Computing, Graph-Computing can be compatible in the new hadoop system. Meantime, according to distinct platforms, Apache offers corresponding packages(.rpm/.deb) or compression files(-bin.tar.gz/tar.gz). So how to choose a suitable package according to your OS is really a trick.

   At first, I choose a .rpm hadoop for my centos7. Question happened when I try to use the rpm package management tool to set up the hadoop:

       rpm -ivh  ./hadoop***.rpm

       The error message shows that the default set-up directory in hadoop/bin conflicts with the system root directory /bin. So you have to use other arguments(relocate or prefix) to denote the appointed installation directory:

      rpm -ivh --relocate /=/opt/temp xxx.rpm;  or   rpm -ivh --prefix= /opt/temp  xxx.rpm

  On the contrary, it is more convenient to handle with the .tar.gz version of hadoop. Just use the tar tool and then copy it to arbitrary rational position.

2 Prerequisites for Installation

  It is suggested that you create a new linux user for hadoop, and assign the new user a higher permission by modifying the sudoers file in /etc directory. Remember to recover the file's read-only attribute after your finished:

     1)Create a new user for hadoop

           groupadd hadoop-user    ----- create a user group

           useradd -g hadoop-user hadoop   ----- add up the new user hadoop to the group

           passwd hadoop                ----- set up a password for your new user.

     2)Modify permission for the new user

           Switch to the root mode, then add the writing permission of the /etc/sudoers to the logged-in new user.

                 #chmod u+w  /etc/sudoers

          emend the sudoers file, add up a new line:

                 user    ALL(ALL)  NOPASSWD: ALL   or     user   ALL(ALL)   ALL

          At last, recover the sudoers file to the read-only mode.

                 #chmod u-w  /etc/sudoers

  Since hadoop already runs in java, you need to have a java version 1.6 or higher on your machine. Fortunately, centos contains openjdk 1.8 for the recent upgrade. You can also choose another official version of java from Oracle. If you choose a .rpm package for java, it needs not to set up the environment variable. If not, add the JAVA_HOME to your profile configuration, the process is omitted...

  The communication between nodes in the cluster happens via ssh. In a multi-node cluster setup of communication between individual nodes, while in single-node cluster, localhost acts as server. The concrete configuration as below:

          $ssh-keygen -t rsa          ------ generation of keys pair

          $cp id_rsa.pub authorized_keys      ------- copy the public key to the authorized user

          $ssh localhost                 ------ test the password-less connection

  If the connection should fail, these general tips might help:

          Enable debugging with ssh -vvv localhost and investigate the error in detail.

          Check the SSH server configuration in /etc/ssh/sshd_config, in particular the options PubkeyAuthentication and AllowUsers. If you made any changes to the SSH server configuration file, you can force a configuration reload with  sudo /etc/init.d/ssh reload.

3 Formal deployment

  It is not recommend to add the HADOOP_HOME to the environment variable for the reason that it is deprecated. you need several steps to finish your work:

  Step1: Configuring the Hadoop environment

          Just append the correlative contents to the respective four files in hadoop-**/conf:

          1)hadoop-env.sh

             $export JAVA_HOME="Where your JAVA HOME"

          2)core-site.xml

            

         3)hdfs-site.xml

        4)mapred-site.xml

  Step2: Running Hadoop

       1)Formatting the NameNode

           $bin/hadoop namenode -format

       2)Starting Hadoop

           You can do a two-stage start up to more easily verify the cluster configuration or just start-all.

           $bin/start-dfs.sh

           $bin/start-mapred.sh

           or:

           $bin/start-all.sh

        3)Checking the started hadoop process

           Normally, if you operated right, by using the 'jps' command, you would find totally five hadoop processes except the jps process itself.

          There are several tips if you have some of the processes failed to start:

          a. Checking out the four configuration file in the hadoop installation directory hadoop-***/conf, make the directory you denoted for 'tmp' or 'namenode', 'datanode' existing.

          b. Censoring if you have the permission to manage the denoted directories above.

          c. Through the hadoop web UI to find relative error log.

   Step3: Testing instance

         Here, we use the hadoop owned example, which aims to compute PI, the first parameter rules the running times of map, the second one denotes the number of samples each map task needs to fetch. The command and map-reduce running procedure are listed below:

         $bin/hadoop jar $HADOOP_HOME/hadoop-examples-1.2.1.jar \

          >pi 2 5


        

0 0