hadoop 2.7.3 集群模式

来源:互联网 发布:js判断ie版本是否大于8 编辑:程序博客网 时间:2024/04/30 11:36
环境准备:

   1.192.168.178.129  名称:myserver

   2.192.168.178.130  名称:myslave1

   3.192.168.178.131  名称:myslave2

   4.192.168.178.133  名称:zookeeper1

   5.192.168.178.134  名称:zookeeper2

   6.192.168.178.135  名称:zookeeper3


应用部署规划:


开始部署hadoop集群:

    1.在每个服务器上都安装JDK(jdk1.7.0_45)

    2.修改每个服务器上hosts

        192.168.178.129 myserver
        192.168.178.130 myslave1
        192.168.178.131 myslave2
        192.168.178.133 zookeeper1
        192.168.178.134 zookeeper2
        192.168.178.135 zookeeper3

    3.每个服务器都进行ssh免登陆

       在myserver这台服务器进行ssh免登陆的操作

       3.1 执行ssh-keygen -t rsa -P ''   一路回车即可

       3.2 拷贝公钥

              ssh-copy-id myserver

              ssh-copy-id myslave1

              ssh-copy-id myslave2

              ssh-copy-id zookeeper1

              ssh-copy-id zookeeper2

              ssh-copy-id zookeeper3

        3.3其他服务器执行3.1和3.2的步骤即可。

        3.4每个服务器都重新启动一下sshd(service sshd restart)

     4.安装hadoop

        在myserver这台服务器进行安装hadoop

        4.1 下载hadoop

                wget https://archive.apache.org/dist/hadoop/core/stable2/hadoop-2.7.3.tar.gz

        4.2 解压

               tar -zxvf hadoop-2.7.3.tar.gz
        4.3 修改hadoop的配置文件

              4.3.1 hadoop-env.sh

                        修改jdk的安装目录

                        export JAVA_HOME=/chenzhongwei/soft/jdk1.7.0_45/                    

              4.3.2 core-site.xml

                        <configuration>
                               <property>
                                      <name>hadoop.tmp.dir</name>
                                      <value>/chenzhongwei/soft/hadoop-2.7.3/tmp</value>
                               </property>

                               <!-- hdfs nameservice的名字 -->
                               <property>
                                       <name>fs.defaultFS</name>
                                        <value>hdfs://nameservice</value>
                               </property>

                               <!-- zookeeper -->
                               <property>
                                       <name>ha.zookeeper.quorum</name>
                                      <value>zookeeper1:2181,zookeeper2:2181,zookeeper3:2181</value>
                                </property>
                       </configuration>

               4.3.3 hdfs-site.xml

                         <configuration>
                                 <!-- dfs.nameservices 命名空间的逻辑名称,多个用,分割 -->
                                 <property>
                                        <name>dfs.nameservices</name>
                                        <value>nameservice</value>
                                 </property>

                                 <!-- 指定nameservice下有两个namenode,分别是namenode1,namenode2 -->
                                 <property>
                                        <name>dfs.ha.namenodes.nameservice</name>
                                        <value>namenode1,namenode2</value>
                                 </property>

                                 <!-- 指定namenode1的rpc通信地址 -->
                                 <property>
                                        <name>dfs.namenode.rpc-address.nameservice.namenode1</name>
                                        <value>myserver:9000</value>
                                 </property>

                                 <!-- 指定namenode1的http通信地址 -->
                                 <property>
                                       <name>dfs.namenode.http-address.nameservice.namenode1</name>
                                       <value>myserver:50070</value>
                                 </property>

                                 <!-- 指定namenode2的rpc通信地址 -->
                                 <property>
                                        <name>dfs.namenode.rpc-address.nameservice.namenode2</name>
                                        <value>myslave1:9000</value>
                                 </property>

                                 <!-- 指定namenode2的http通信地址 -->
                                 <property>
                                        <name>dfs.namenode.http-address.nameservice.namenode2</name>
                                        <value>myslave1:50070</value>
                                 </property>

                                 <!-- 指定namenode的元数据存放的JournalNode的地址,必须基数,至少三个 -->
                                 <property>
                                        <name>dfs.namenode.shared.edits.dir</name>
                                        <value>qjournal://zookeeper1:8485;zookeeper2:8485;zookeeper3:8485/nameservice</value>
                                 </property>

                                 <!--这是JournalNode进程保持逻辑状态的路径。这是在linux服务器文件的绝对路径-->
                                 <property>
                                        <name>dfs.journalnode.edits.dir</name>
                                        <value>/chenzhongwei/soft/journal/</value>
                                 </property>

                                 <!-- 开启namenode失败后自动切换 -->
                                 <property>
                                        <name>dfs.ha.automatic-failover.enabled</name>
                                        <value>true</value>
                                 </property>

                                 <!-- 配置失败自动切换实现方式 -->
                                 <property>
                                        <name>dfs.client.failover.proxy.provider.nameservice</name>
                                        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
                                 </property>

                                 <!-- 配置隔离机制方法,多个机制用换行分割 -->
                                 <property>
                                       <name>dfs.ha.fencing.methods</name>
                                       <value>
                                                     sshfence
                                                     shell(/bin/true)
                                       </value>
                                 </property>

                                 <!-- 使用sshfence隔离机制时需要ssh免登陆 -->
                                 <property>
                                       <name>dfs.ha.fencing.ssh.private-key-files</name>
                                       <value>~/.ssh/id_rsa</value>
                                 </property>

                                 <!-- 配置sshfence隔离机制超时时间30秒 -->
                                 <property>
                                        <name>dfs.ha.fencing.ssh.connect-timeout</name>
                                        <value>30000</value>
                                 </property>
                        </configuration>

               4.3.4 mapred-site.xml

                         <configuration>
                                 <!-- 通知框架MR使用YARN -->
                                 <property>
                                       <name>mapreduce.framework.name</name>
                                       <value>yarn</value>
                                 </property>
                         </configuration>

               4.3.5 yarn-site.xml

                        <configuration>
                                <!-- 开启RM高可靠 -->
                                <property>
                                       <name>yarn.resourcemanager.ha.enabled</name>
                                       <value>true</value>
                                </property>

                                <!-- 指定RM的cluster id -->
                                <property>
                                       <name>yarn.resourcemanager.cluster-id</name>
                                       <value>yrc</value>
                                </property>

                                <!-- 指定RM的名字 -->
                                <property>
                                       <name>yarn.resourcemanager.ha.rm-ids</name>
                                       <value>rm1,rm2</value>
                                </property>

                                <!-- 分别指定RM的地址 -->
                                <property>
                                       <name>yarn.resourcemanager.hostname.rm1</name>
                                       <value>myserver</value>
                                </property>

                                <property>
                                       <name>yarn.resourcemanager.hostname.rm2</name>
                                       <value>myslave2</value>
                                </property>

                                <!-- 指定zk集群地址 -->
                                <property>
                                       <name>yarn.resourcemanager.zk-address</name>
                                       <value>zookeeper1:2181,zookeeper2:2181,zookeeper3:2181</value>
                                </property>

                                <property>
                                      <name>yarn.nodemanager.aux-services</name>

                                      <value>mapreduce_shuffle</value>
                                 </property>
                       </configuration>

             4.3.6 slaves

                        zookeeper1
                        zookeeper2
                        zookeeper3

                        注意:在namenode的服务器上 slaves里面配置的是datanode节点。在resourceManager的服务器上配置的是nodeManager的节点。

       4.4 把hadoop拷贝到其他5台服务器上

             先把share目录下的doc删掉,否则拷贝时太耗时间

              scp -r hadoop-2.7.3 myslave1:/chenzhongwei/soft/

              scp -r hadoop-2.7.3 myslave2:/chenzhongwei/soft/
              scp -r hadoop-2.7.3 zookeeper1:/chenzhongwei/soft/
              scp -r hadoop-2.7.3 zookeeper2:/chenzhongwei/soft/
              scp -r hadoop-2.7.3 zookeeper3:/chenzhongwei/soft/

      4.5  修改环境变量

              export JAVA_HOME=/chenzhongwei/soft/jdk1.7.0_45
              export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
              export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
              export HADOOP_HOME=/chenzhongwei/soft/hadoop-2.7.3

              执行 source /etc/profile

     4.6 启动hadoop

           4.6.1 分别在zookeeper1,zookeeper2,zookeeper3服务器上启动zookeeper

                     执行bin/zkServer.sh start

           4.6.2 分别在zookeeper1,zookeeper2,zookeeper3服务器上启动journalnode

                      sbin/hadoop-daemon.sh start journalnode

           4.6.3 在myserver服务器上进行hdfs格式化

                     hdfs namenode -format

           4.6.4 把myserver服务器上的hadoop目录下的tmp拷贝到slave1上

                     scp -r tmp/ myslave1:/chenzhongwei/soft/hadoop-2.7.3

           4.6.5 在myserver服务器上进行zkfc格式化

                     hdfs zkfc -formatZK

           4.6.6 在myserver服务器上启动hdfs

                     sbin/start-dfs.sh

           4.6.7 在myserver和myslave2上启动yarn

                     sbin/start-yarn.sh

     4.7 验证部署hadoop的成果

            4.7.1 查看之前部署规划是否成功

            myserver:

                       


           myslave1:

          


          myslave2:

         


         zookeeper1:

         


         zookeeper2:

        


        zookeeper3:

       


        跟我们的规划是一致的,说明我们的配置文件是生效的。


        4.7.2测试我们的集群搭建是否成功

                 hdfs集群状态:

                

                 一个是active,一个是standby。

                 如果我们kill掉myserver的namenode进程,myslave1自动切换成active

                           


                          


                    我们再把myserver的namenode起来

                    执行sbin/hadoop-daemon.sh start namenode

                   

                      myserver的namenode就变成了standby


                      现在我们再测试一下服务器down机的情况,我们把myslave1的虚拟机给关闭掉,看看能不能切换

                     


                      看一下myserver的服务器的状态

                     

          

                      过了30s,myserver的namenode变成了active

                      看一下我们的datanode的情况

                                                 到此,我们的hdfs的集群已经部署完成。

          4.7.3 测试yarn的集群搭建是否成功

                    yarn只有active的才能访问,限制访问的是myslave2服务器

                   

                      当把myslave2的resourceManager进程杀掉,我们就可以通过myserver访问了

                     

                      访问myserver的yarn

                     

                        到此,yarn的集群也成功了


现在hdfs和yarn的集群都已经配置完成,通过测试都已经通过。













原创粉丝点击