Compiling and installing Hadoop 2.4 on 64-bit Oracle Linux 6

来源:互联网 发布:旅行消费数据报告 编辑:程序博客网 时间:2024/05/17 00:03

Compiling and installing Hadoop 2.4 on 64-bit Oracle Linux 6

If you are planning to run Hadoop on a 64 bit OS you might want to build it from source. The native Hadoop library (libhadoop.so.1.0.0) found in the Hadoop 2.4 distribution is actually compiled on a 32 bit platform. This results in a myriad of annoying errors like the one below, which you can eliminate if you recompile libhadoop.so.1.0.0 on your 64 bit platform.

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

In this tutorial we will see how to prepare a clean Oracle Linux 6 system for accommodating Hadoop and how to actually build Hadoop from its source files.

Prerequisites
The only prerequisite for following this tutorial is a default installation of 64-bit Oracle Linux Server 6.5. The host I am using is named hadoop

If this is a dev/test system you might as well disable SELinux and iptables to make your life easier.

[root@hadoop ~]# service iptables stopiptables: Setting chains to policy ACCEPT: filter          [  OK  ]iptables: Flushing firewall rules:                         [  OK  ]iptables: Unloading modules:                               [  OK  ][root@hadoop ~]# chkconfig iptables off[root@hadoop ~]#

For disabling SELinux set the SELINUX parameter in /etc/sysconfig/selinux from enforcing to disabled.

[root@hadoop ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config[root@hadoop ~]#

You must reboot the system for the changes to take effect.

JDK Installation
You can download the latest JDK from Oracle Technology Network or use wget to download it directly if you know the exact URL for the version you’re downloading.

[root@hadoop ~]# wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u5-b13/jdk-8u5-linux-x64.tar.gz--2014-07-12 12:40:23--  http://download.oracle.com/otn-pub/java/jdk/8u5-b13/jdk-8u5-linux-x64.tar.gzResolving download.oracle.com... 176.255.203.9, 176.255.203.10...100%[=====================================================================================================================================================================>] 159,008,252 1.80M/s   in 93s2014-07-12 12:41:57 (1.62 MB/s) - “jdk-8u5-linux-x64.tar.gz” saved [159008252/159008252][root@hadoop ~]#

Now extract the archive in /opt.

[root@hadoop ~]# tar -xzf jdk-8u5-linux-x64.tar.gz -C /opt/[root@hadoop ~]#

Use alternatives to set the Java symbolic links to your newly installed JDK.

[root@hadoop ~]# alternatives --install /usr/bin/java java /opt/jdk1.8.0_05/bin/java 2[root@hadoop ~]# alternatives --config javaThere are 3 programs which provide 'java'.  Selection    Command-----------------------------------------------*+ 1           /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java   2           /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java   3           /opt/jdk1.8.0_05/bin/javaEnter to keep the current selection[+], or type selection number: 3[root@hadoop ~]#

Let’s confirm that java points to the correct JDK version.

[root@hadoop ~]# java -versionjava version "1.8.0_05"Java(TM) SE Runtime Environment (build 1.8.0_05-b13)Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)[root@hadoop ~]#

Create a dedicated Hadoop user account
Our next step is to create a dedicated user account that owns and runs the Hadoop software. I am going to name my user haduser and his primary group will be called hadgroup.

[root@hadoop ~]# groupadd hadgroup[root@hadoop ~]# useradd haduser -G hadgroup[root@hadoop ~]# passwd haduserChanging password for user haduser.New password:Retype new password:passwd: all authentication tokens updated successfully.[root@hadoop ~]#

Let’s switch to the newly created.

[root@hadoop ~]# su - haduser[haduser@hadoop ~]$

Setup key based authentication for haduser
Hadoop requires secure shell connections to the localhost without a passphrase, so let’s configure key-based SSH access.

[haduser@hadoop ~]$ ssh-keygen -t rsa -P ""Generating public/private rsa key pair.Enter file in which to save the key (/home/haduser/.ssh/id_rsa):Created directory '/home/haduser/.ssh'.Your identification has been saved in /home/haduser/.ssh/id_rsa.Your public key has been saved in /home/haduser/.ssh/id_rsa.pub.The key fingerprint is:8c:79:27:a5:81:00:3c:00:21:b6:3c:e7:72:bc:2c:65 haduser@hadoopThe key's randomart image is:+--[ RSA 2048]----+|*=...            ||+ +  . .         || + o  . . .      ||  =    + +       || . E  o S .      ||  * .  . o       || . o             ||  .              ||                 |+-----------------+[haduser@hadoop ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys[haduser@hadoop ~]$ chmod 0600 ~/.ssh/authorized_keys

We can do a quick test by invoking date via ssh and adding localhost to the list of known hosts if necessary.

[haduser@hadoop ~]$ ssh localhost dateThe authenticity of host 'localhost (::1)' can't be established.RSA key fingerprint is 8a:c5:52:a2:cf:c9:55:c1:57:15:5c:37:25:16:41:d6.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'localhost' (RSA) to the list of known hosts.Sat Jul 12 16:07:28 BST 2014[haduser@hadoop ~]$

Install build tools
There is a set of tools and libraries we have to install that are required for compiling Hadoop from source. Let’s start by adding the default development toolset (gccautoconf etc.). We’ll have to do this as root.

[root@hadoop protobuf-2.5.0]# yum groupinstall "Development Tools" "Development Libraries"Loaded plugins: refresh-packagekit, securitySetting up Group ProcessPackage gcc-4.4.7-4.el6.x86_64 already installed and latest versionPackage 1:make-3.81-20.el6.x86_64 already installed and latest versionPackage patch-2.6-6.el6.x86_64 already installed and latest versionPackage 1:pkgconfig-0.23-9.1.el6.x86_64 already installed and latest versionPackage gettext-0.17-16.el6.x86_64 already installed and latest versionPackage binutils-2.20.51.0.2-5.36.el6.x86_64 already installed and latest versionPackage elfutils-0.152-1.el6.x86_64 already installed and latest versionPackage cvs-1.11.23-16.el6.x86_64 already installed and latest versionWarning: Group Development Libraries does not exist.Resolving Dependencies--> Running transaction check---> Package autoconf.noarch 0:2.63-5.1.el6 will be installed---> Package automake.noarch 0:1.11.1-4.el6 will be installed---> Package bison.x86_64 0:2.4.1-5.el6 will be installed…Transaction Summary===============================================================================================================================================================================================================Install      32 Package(s)Total download size: 57 MInstalled size: 186 MIs this ok [y/N]:yDownloading Packages:(1/32): autoconf-2.63-5.1.el6.noarch.rpm                                                                                                                                                | 781 kB     00:00(2/32): automake-1.11.1-4.el6.noarch.rpm                                                                                                                                                | 550 kB     00:00(3/32): bison-2.4.1-5.el6.x86_64.rpm                                                                                                                                                    | 636 kB     00:00(4/32): byacc-1.9.20070509-7.el6.x86_64.rpm                                                                                                                                             |  47 kB     00:00…Dependency Installed:  gettext-devel.x86_64 0:0.17-16.el6     gettext-libs.x86_64 0:0.17-16.el6   kernel-devel.x86_64 0:2.6.32-431.20.3.el6   libgcj.x86_64 0:4.4.7-4.el6                 libgfortran.x86_64 0:4.4.7-4.el6  libstdc++-devel.x86_64 0:4.4.7-4.el6   perl-Error.noarch 1:0.17015-4.el6   perl-Git.noarch 0:1.7.1-3.el6_4.1           systemtap-client.x86_64 0:2.3-4.0.1.el6_5   systemtap-devel.x86_64 0:2.3-4.0.1.el6_5Complete![root@hadoop protobuf-2.5.0]#

Another two packages required for successfully compiling Hadoop are openssl-devel and cmake.

[root@hadoop ~]# yum install openssl-devel cmakeLoaded plugins: refresh-packagekit, securitySetting up Install ProcessResolving Dependencies--> Running transaction check---> Package cmake.x86_64 0:2.6.4-5.el6 will be installed---> Package openssl-devel.x86_64 0:1.0.1e-16.el6_5.14 will be installed…Transaction Summary=============================================================================================================Install       8 Package(s)Total download size: 7.1 MInstalled size: 22 MIs this ok [y/N]:y…Dependency Installed:  keyutils-libs-devel.x86_64 0:1.4-4.el6               krb5-devel.x86_64 0:1.10.3-15.el6_5.1  libcom_err-devel.x86_64 0:1.42.8-1.0.1.el6           libselinux-devel.x86_64 0:2.0.94-5.3.el6_4.1  libsepol-devel.x86_64 0:2.0.41-4.el6                 zlib-devel.x86_64 0:1.2.3-29.el6Complete![root@hadoop ~]#

We will also need Apache Maven (build automation tool) and Protocol Buffers (serialization library developed by Google). Let’s start by getting and uncompressing the latest version of Maven (3.2.2 at the time of writing of this article).

[root@hadoop ~]# wget http://mirrors.gigenet.com/apache/maven/maven-3/3.2.2/binaries/apache-maven-3.2.2-bin.tar.gz--2014-07-12 16:41:02--  http://mirrors.gigenet.com/apache/maven/maven-3/3.2.2/binaries/apache-maven-3.2.2-bin.tar.gzResolving mirrors.gigenet.com... 69.65.15.34Connecting to mirrors.gigenet.com|69.65.15.34|:80... connected.HTTP request sent, awaiting response... 200 OKLength: 6940967 (6.6M) [application/x-gzip]Saving to: “apache-maven-3.2.2-bin.tar.gz”100%[==============================================================================>] 6,940,967    677K/s   in 12s2014-07-12 16:41:15 (569 KB/s) - “apache-maven-3.2.2-bin.tar.gz” saved [6940967/6940967][root@hadoop ~]# tar -zxf apache-maven-3.2.2-bin.tar.gz -C /opt/[root@hadoop ~]#

Our next step is to create a dedicated initialization script that will set the following environment variables for Maven.

JAVA_HOME=/opt/jdk1.8.0_05M3_HOME=/opt/apache-maven-3.2.2PATH=/opt/apache-maven-3.2.2/bin:$PATH

We will create a new file (maven.sh) in /etc/profile.d and put the content above inside:

[root@hadoop ~]# cat < /etc/profile.d/maven.sh> export JAVA_HOME=/opt/jdk1.8.0_05> export M3_HOME=/opt/apache-maven-3.2.2> export PATH=/opt/apache-maven-3.2.2/bin:$PATH> EOF[root@hadoop ~]#

Logout and login back and verify that M3_HOME is correctly set.

[root@hadoop ~]# echo $M3_HOME/opt/apache-maven-3.2.2[root@hadoop ~]#

Confirm that you can successfully start Maven and that it is using the correct Java version.

[root@hadoop ~]# mvn -versionApache Maven 3.2.2 (45f7c06d68e745d05611f7fd14efb6594181933e; 2014-06-17T14:51:42+01:00)Maven home: /opt/apache-maven-3.2.2Java version: 1.8.0_05, vendor: Oracle CorporationJava home: /opt/jdk1.8.0_05/jreDefault locale: en_US, platform encoding: UTF-8OS name: "linux", version: "3.8.13-35.1.2.el6uek.x86_64", arch: "amd64", family: "unix"[root@hadoop ~]#

Time to deal with Protocol Buffers. First, let’s download protobufer. Note that I do this using the haduseraccount.

[haduser@hadoop ~]$ wget https://protobuf.googlecode.com/files/protobuf-.5.0.tar.bz2                                                                                                                          --2014-07-13 13:45:59--  https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.bz2Resolving protobuf.googlecode.com... 173.194.78.82, 2a00:1450:400c:c00::52Connecting to protobuf.googlecode.com|173.194.78.82|:443... connected.HTTP request sent, awaiting response... 200 OKLength: 1866763 (1.8M) [application/x-bzip2]Saving to: “protobuf-2.5.0.tar.bz2”100%[=====================================================================================================================================================================>] 1,866,763   1.37M/s   in 1.3s2014-07-13 13:46:00 (1.37 MB/s) - “protobuf-2.5.0.tar.bz2” saved [1866763/1866763][haduser@hadoop ~]$

Untar the file and run the configure script to prepare the source code for compilation.

[haduser@hadoop ~]$ tar jxf protobuf-2.5.0.tar.bz2[haduser@hadoop ~]$ cd protobuf-2.5.0[haduser@hadoop protobuf-2.5.0]$ ./configure --prefix=/home/haduser/protobuf-2.5.0/inst/binchecking whether to enable maintainer-specific portions of Makefiles... yeschecking build system type... x86_64-unknown-linux-gnuchecking host system type... x86_64-unknown-linux-gnuchecking target system type... x86_64-unknown-linux-gnuchecking for a BSD-compatible install... /usr/bin/install -cchecking whether build environment is sane... yes...config.status: creating Makefileconfig.status: creating scripts/gtest-configconfig.status: creating build-aux/config.hconfig.status: build-aux/config.h is unchangedconfig.status: executing depfiles commandsconfig.status: executing libtool commands[haduser@hadoop protobuf-2.5.0]$

Let’s build the Protocol Buffer objects by running make.

[haduser@hadoop protobuf-2.5.0]$ makemake  all-recursivemake[1]: Entering directory `/home/haduser/protobuf-2.5.0'Making all in .make[2]: Entering directory `/home/haduser/protobuf-2.5.0'make[2]: Leaving directory `/home/haduser/protobuf-2.5.0'Making all in src…make[3]: Leaving directory `/home/haduser/protobuf-2.5.0/src'make[2]: Leaving directory `/home/haduser/protobuf-2.5.0/src'make[1]: Leaving directory `/home/haduser/protobuf-2.5.0'[haduser@hadoop protobuf-2.5.0]$

Invoke make install to put the objects we’ve just built into their proper locations.

[haduser@hadoop protobuf-2.5.0]$ make installMaking install in .make[1]: Entering directory `/home/haduser/protobuf-2.5.0'make[2]: Entering directory `/home/haduser/protobuf-2.5.0'...----------------------------------------------------------------------Libraries have been installed in:   /home/haduser/protobuf-2.5.0/inst/bin/libIf you ever happen to want to link against installed librariesin a given directory, LIBDIR, you must either use libtool, andspecify the full pathname of the library, or use the `-LLIBDIR'flag during linking and do at least one of the following:   - add LIBDIR to the `LD_LIBRARY_PATH' environment variable     during execution   - add LIBDIR to the `LD_RUN_PATH' environment variable     during linking   - use the `-Wl,-rpath -Wl,LIBDIR' linker flag   - have your system administrator add LIBDIR to `/etc/ld.so.conf'See any operating system documentation about shared libraries formore information, such as the ld(1) and ld.so(8) manual pages.----------------------------------------------------------------------...make[3]: Leaving directory `/home/haduser/protobuf-2.5.0/src'make[2]: Leaving directory `/home/haduser/protobuf-2.5.0/src'make[1]: Leaving directory `/home/haduser/protobuf-2.5.0/src'[haduser@hadoop protobuf-2.5.0]$

This concludes all preparations and we are now ready to crack on with compiling Hadoop.

Compiling Hadoop
We will perform the compilation as the Hadoop owner (haduser). We also need Protocol Buffers added to the current PATH.

[haduser@hadoop protobuf-2.5.0]$ export PATH=/home/haduser/protobuf-2.5.0/inst/bin/bin:$PATH[haduser@hadoop protobuf-2.5.0]$

Get Hadoop’s source code from Apache.

[haduser@hadoop ~]$ wget http://apache.fastbull.org/hadoop/common/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz                                                                                                         --2014-07-12 16:36:47--  http://apache.fastbull.org/hadoop/common/hadoop-2.4.1/hadoop-2.4.1-src.tar.gzResolving apache.fastbull.org... 194.116.84.14...2014-07-12 16:37:01 (1.13 MB/s) - “hadoop-2.4.1-src.tar.gz” saved [15417097/15417097][haduser@hadoop ~]$

Extract the archive in haduser‘s home directory.

[haduser@hadoop ~]$ tar xf hadoop-2.4.1-src.tar.gz[haduser@hadoop ~]$ cd hadoop-2.4.1-src[haduser@hadoop hadoop-2.4.1-src]$

Before we try to build Hadoop we’ll have to deal with doclint. This new addition was added to Javadoc with JDK 8, and it’s aim is to make sure that Javadoc’s output is W3C HTML 4.01 compliant. This makes the handling of Javadoc more strict and will prevent the successful compilation of Hadoop due to some missing HTML tags. The easiest way to avoid this issue is to disable doclint completely.

Open the pom.xml file that sits in the haddop-2.4.1-src folder. This XML file contains information about the project and configuration details used by Maven to build it. We will add one additional parameter in the global properties section that will disable doclint.

After the change your properties section should look like this:

<properties>    <distMgmtSnapshotsId>apache.snapshots.https</distMgmtSnapshotsId>    <distMgmtSnapshotsName>Apache Development Snapshot Repository</distMgmtSnapshotsName>    <distMgmtSnapshotsUrl>https://repository.apache.org/content/repositories/snapshots</distMgmtSnapshotsUrl>    <distMgmtStagingId>apache.staging.https</distMgmtStagingId>    <distMgmtStagingName>Apache Release Distribution Repository</distMgmtStagingName>    <distMgmtStagingUrl>https://repository.apache.org/service/local/staging/deploy/maven2</distMgmtStagingUrl>    <!-- platform encoding override -->    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>    <additionalparam>-Xdoclint:none</additionalparam>  </properties>

We are now ready to build Hadoop. Invoke Maven with the appropriate build profile, sit back and wait for the build process to complete. Depending on your system this might take a while.

[haduser@hadoop hadoop-2.4.1-src]$ mvn package -Pdist,native -DskipTests -Dtar[INFO] Scanning for projects...Downloading: http://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.4.0/maven-bundle-plugin-2.4.0.pomDownloaded: http://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.4.0/maven-bundle-plugin-2.4.0.pom (4 KB at 11.8 KB/sec)...[INFO] ------------------------------------------------------------------------[INFO] BUILD SUCCESS[INFO] ------------------------------------------------------------------------[INFO] Total time: 09:38 min[INFO] Finished at: 2014-07-13T14:11:43+01:00[INFO] Final Memory: 194M/493M[INFO] ------------------------------------------------------------------------[haduser@hadoop hadoop-2.4.1-src]$

After the build process is complete, switch back to root and place the compiled code in its final location – I use /opt.

[root@hadoop ~]# mv /home/haduser/hadoop-2.4.1-src/hadoop-dist/target/hadoop-2.4.1 /opt/[root@hadoop ~]#

Switch to haduser again and put the JAVA_HOME and HADOOP_INSTALL environment variables in the user’s profile.

[root@hadoop ~]# su - haduser[haduser@hadoop ~]$ cat >> ~/.bash_profile << EOF> export JAVA_HOME=/opt/jdk1.8.0_05> export HADOOP_INSTALL=/opt/hadoop-2.4.1> export PATH=$PATH:/opt/hadoop-2.4.1/sbin:/opt/hadoop-2.4.1/bin> EOF[haduser@hadoop ~]$ source .bash_profile[haduser@hadoop ~]$

Testing Hadoop
Our final task is to quickly configure and test the code we’ve just built.
Edit the $HADOOP_INSTALL/etc/hadoop/core-site.xml file and put the following lines between the <configuration></configuration> tags.

<property>  <name>fs.default.name</name>    <value>hdfs://localhost:9000</value></property><property>  <name>hadoop.tmp.dir</name>    <value>/opt/hadoop-2.4.1/tmp</value></property>

Create a new mapred-site.xml file based on the standard template, by copying the mapred-site.xml.template file.

[haduser@hadoop ~]$ cp /opt/hadoop-2.4.1/etc/hadoop/mapred-site.xml.template /opt/hadoop-2.4.1/etc/hadoop/mapred-site.xml[haduser@hadoop ~]$

Edit the newly created $HADOOP_INSTALL/etc/hadoop/mapred-site.xml file and put the following between the <configuration></configuration> tags.

<property>  <name>mapred.job.tracker</name>  <value>localhost:9002</value></property>

Format the NameNode.

[haduser@hadoop ~]$ hdfs namenode -format14/07/13 14:22:13 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG:   host = hadoop/192.168.56.101STARTUP_MSG:   args = [-format]STARTUP_MSG:   version = 2.4.1…14/07/13 14:22:14 INFO namenode.FSImage: Allocated new BlockPoolId: BP-716394400-192.168.56.101-140525773426814/07/13 14:22:14 INFO common.Storage: Storage directory /opt/hadoop-2.4.1/tmp/dfs/name has been successfully formatted.14/07/13 14:22:14 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 014/07/13 14:22:14 INFO util.ExitUtil: Exiting with status 014/07/13 14:22:14 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at hadoop/192.168.56.101************************************************************/[haduser@hadoop ~]$

Now start the Hadoop DFS and Yarn daemons.

[haduser@hadoop ~]$ start-dfs.shStarting namenodes on [localhost]localhost: starting namenode, logging to /opt/hadoop-2.4.1/logs/hadoop-haduser-namenode-hadoop.outlocalhost: starting datanode, logging to /opt/hadoop-2.4.1/logs/hadoop-haduser-datanode-hadoop.outStarting secondary namenodes [0.0.0.0]0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.4.1/logs/hadoop-haduser-secondarynamenode-hadoop.out[haduser@hadoop ~]$[haduser@hadoop ~]$  start-yarn.shstarting yarn daemonsstarting resourcemanager, logging to /opt/hadoop-2.4.1/logs/yarn-haduser-resourcemanager-hadoop.outlocalhost: starting nodemanager, logging to /opt/hadoop-2.4.1/logs/yarn-haduser-nodemanager-hadoop.out[haduser@hadoop ~]$

Create a test directory and list the HDFS root directory to verify.

[haduser@hadoop ~]$ hadoop fs -mkdir hdfs://localhost:9000/test[haduser@hadoop ~]$ hadoop fs -ls hdfs://localhost:9000/Found 1 itemsdrwxr-xr-x   - haduser supergroup          0 2014-07-19 09:31 hdfs://localhost:9000/test[haduser@hadoop ~]$

Open a web browser and open an HTTP connection to your host IP at port 50070.

HDFS health console

0 0