Ubuntu 16.04安装Hadoop 2.7.3

来源:互联网 发布:淘宝的工具吧在哪里 编辑:程序博客网 时间:2024/06/06 07:44

Hadoop安装

0.部署规划:

修改 /etc/hosts,添加hosts如下:

# hadoop nodes

192.168.75.128  master

192.168.75.130  slave1

192.168.75.131  slave2

1.添加hadoop用户并添加到sudoers

Sudo adduser hadoop

Sudo vim /etc/sudoers    ###可能需要添加可写权限chmod +w sudoers

添加内容入下:

hadoop    ALL=(ALL:ALL) ALL

2.安装Java并设置环境变量:

下载JDK并解压到/usr/local,修改/etc/profile如下:

# set jdk classpath

export JAVA_HOME=/usr/local/jdk1.8.0_111

export JRE_HOME=$JAVA_HOME/jre

export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH

export CLASSPATH=$CLASSPATH:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

运行source /etc/profile使环境变量生效。

3.安装openssh并生成key

sudo apt-get install openssh-server

切换到hadoop用户,运行:

ssh-keygen -t rsa

cat .ssh/id_rsa.pub >> .ssh/authorized_keys

将生成的 authorized_keys文件复制到 slave1slave2.ssh目录下(其余类似)

scp .ssh/authorized_keys hadoop@slave1:~/.ssh

scp .ssh/authorized_keys hadoop@slave2:~/.ssh

PS:设置命令行显示IP

可使用ssh slave1测试能够免密码登录。

4.下载并配置hadoop(如下步骤三台机器都需要设置,或者设置一台后scp进行复制):

使用wget下载,例如:

wget http://apache.fayea.com/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

配置HADOOP环境变量:

# set hadoop classpath

export HADOOP_HOME=/home/hadoop/hadoop-2.7.3/

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

export HADOOP_PREFIX=$HADOOP_HOME

export CLASSPATH=$CLASSPATH:.:$HADOOP_HOME/bin

配置./etc/hadoop/core-site.xml

<configuration>

    <property>

            <name>fs.defaultFS</name>

             <!-- master: /etc/hosts 配置的域名 master -->

            <value>hdfs://master:9000/</value>

    </property>

</configuration>

配置./etc/hadoop/hdfs-site.xml

 <configuration>

         <property>

             <name>dfs.namenode.name.dir</name>

             <value>/home/hadoop/hadoop-2.7.3/dfs/namenode</value>

         </property>

         <property>

                 <name>dfs.datanode.data.dir</name>

                 <value>/home/hadoop/hadoop-2.7.3/dfs/datanode</value>

         </property>

         <property>

                 <name>dfs.replication</name>

                 <value>1</value>

         </property>

         <property>

                 <name>dfs.namenode.secondary.http-address</name>

                 <value>master:9001</value>

         </property>

 </configuration>

配置./etc/hadoop/mapred-site.xml

<configuration>

    <property>

      <name>mapreduce.framework.name</name>

      <value>yarn</value>

    </property>

    <property>

      <name>mapreduce.jobhistory.address</name>

      <value>master:10020</value>

    </property>

    <property>

      <name>mapreduce.jobhistory.webapp.address</name>

      <value>master:19888</value>

    </property>

</configuration>

配置./etc/hadoop/yarn-site.xml

<configuration>

    <property>

      <name>yarn.nodemanager.aux-services</name>

      <value>mapreduce_shuffle</value>

    </property>

    <property>                                                             

      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

      <value>org.apache.hadoop.mapred.ShuffleHandler</value>

    </property>

    <property>

      <name>yarn.resourcemanager.address</name>

        <value>master:8032</value>

    </property>

    <property>

        <name>yarn.resourcemanager.scheduler.address</name>

        <value>master:8030</value>

    </property>

    <property>

      <name>yarn.resourcemanager.resource-tracker.address</name>

        <value>master:8031</value>

    </property>

    <property>

      <name>yarn.resourcemanager.admin.address</name>

        <value>master:8033</value>

    </property>

    <property>

      <name>yarn.resourcemanager.webapp.address</name>

        <value>master:8088</value>

    </property>

</configuration>

配置./etc/hadoop/slaves文件:

[hadoop@192.168.75.128 hadoop]$cat slaves

localhost

slave1

slave2

配置环境变量文件:hadoop-env.shmapred-env.shyarn-env.sh,添加 JAVA_HOME

# The java implementation to use.

export JAVA_HOME=/usr/local/jdk1.8.0_131

5.启动hadoop,在master节点操作。

A.初始化文件系统:

/home/hadoop/hadoop-2.7.3/bin/hdfs namenode -format

输出显示:

Storage directory /home/hadoop/hadoop-2.7.3/dfs/namenode has been successfully formatted.

表示初始化成功。

B.启动hadoop集群:

/home/hadoop/hadoop-2.7.3/sbin/start-all.sh

使用jps检查启动情况:

###master

[hadoop@192.168.75.128 hadoop]$jps

3584 NameNode

4147 NodeManager

4036 ResourceManager

3865 SecondaryNameNode

3721 DataNode

###slave1slave2使用jps只有:

Datanode

NodeManager

浏览器查看HDFS

http://192.168.75.128:50070

浏览器查看MapReduce

http://192.168.75.128:8088

C.停止Hadoop

/home/hadoop/hadoop-2.7.3/sbin/stop-all.sh