Hadoop2.7.4+Hbase1.2.6+Zookeeper3.4.8集成安装-配置-调试-应用(一)

来源:互联网 发布:淘宝天猫2015年销售额 编辑:程序博客网 时间:2024/06/09 00:20

第一章、准备Hadoop2.7.4,Zookeeper3.4.8,Hbase1.2.6安装文件

Hadoop-2.7.4安装包  下载地址: http://hadoop.apache.org/releases.html

Hbase-1.2.6安装包    下载地址:http://mirror.bit.edu.cn/apache/hbase/

zookeeper-3.4.8安装包  下载地址:http://mirror.bit.edu.cn/apache/zookeeper/



第二章、安装环境CentOS6 7.0

1. linux虚拟机11台,服务器CPU:i5双核以上,内存:2G以上


2.机器名ip地址安装软件及运行进程说明:


master1 192.168.1.20hadoop、Zookeeper、hbase NN、DN 、RM、DFSZKFC、journalNode、HMaster、QuorumPeerMain 
master2 192.168.1.21hadoop、Zookeeper、hbase NN、DN、RM、DFSZKFC、journalNode、QuorumPeerMain 
slave1 192.168. 1.22hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 
slave1 192.168. 1.23hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 

slave1 192.168. 1.24hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 

slave1 192.168. 1.25hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 

slave1 192.168. 1.26hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 

slave1 192.168. 1.27hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 

slave1 192.168. 1.28hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 

slave1 192.168. 1.29hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain

slave1 192.168. 1.30hadoop、Zookeeper、hbase DN、NM、journalNode、HRegionServer、QuorumPeerMain 



3.此时我们先对第一台机器做修改,其他的后期克隆该机器即可



4.修改/etc/hosts文件,举例(参考样式):


配置完成后截图(真是环境截图):



修改/etc/sysconfig/network




5.重启机器


或者临时修改hostname




6.安装JDK1.7.0以上版本即可(如果已经安装,此步骤可以跳过

6.1 将JDK解压


6.2 编辑/etc/profile 添加jdk路径

6.3 保存退出


6.4 修改CentOS里的java优先级

此时JDK已经安装完成




7.解压hadoop并修改环境变量


7.1 解压Hadoop2.7.1安装包(安装包统一放置在根目录的data文件夹中),配置环境变量





8.修改配置文件

8.1 修改$HADOOP_HOME/etc/hadoop/slaves文件


加入所有slave节点的hostname


8.2 修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh文件,修改JAVA_HOME路径


举例(参考图例):



配置完成后效果:


8.3 修改$HADOOP_HOME/etc/hadoop/yarn-env.sh文件,修改JAVA_HOME路径


举例(参考图例):


配置完成后效果:



8.4 可以在配置文件中查找coresite.xml,hdfs-site.xml,修改HADOOPHOME/etc/hadoop/coresite.xml文件


<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->


<!-- Put site-specific property overrides in this file. -->


<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://cluster</value>
</property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/soft/hadoop-2.6.0/tmp/</value>
        </property>
<property>
<name>ha.zookeeper.quorum</name>
<value>master1:2181,master2:2181,slave1:2181,slave2:2181,slave3:2181</value>
</property>
<property>
                <name>io.file.buffer.size</name>
                <value>131072</value>
        </property>
<property>
                <name>dfs.ha.fencing.methods</name>
                <value>sshfence</value>
        </property>
        <property>
                <name>dfs.ha.fencing.ssh.private-key-files</name>
                <value>/root/.ssh/id_dsa</value>
        </property>
</configuration>



 修改HADOOP_HOME/etc/hadoop/hdfs-site.xml文件


<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->


<!-- Put site-specific property overrides in this file. -->


<configuration>
<property>
                <name>dfs.nameservices</name>
                <value>cluster</value>
        </property>
        <property>
                <name>dfs.ha.namenodes.cluster</name>
                <value>master1,master2</value>
        </property>
        <property>
                <name>dfs.namenode.rpc-address.cluster.master1</name>
                <value>master1:8020</value>
        </property>
        <property>
                <name>dfs.namenode.rpc-address.cluster.master2</name>
                <value>master2:8020</value>
        </property>
        <property>
                <name>dfs.namenode.http-address.cluster.master1</name>
                <value>master1:50070</value>
        </property>
        <property>
                <name>dfs.namenode.http-address.cluster.master2</name>
                <value>master2:50070</value>
        </property>
<property>
                 <name>dfs.namenode.servicerpc-address.cluster.master1</name>
                 <value>master1:53333</value>
         </property>
         <property>
                 <name>dfs.namenode.servicerpc-address.cluster.master2</name>
                 <value>master2:53333</value>
         </property>
        <property>
                <name>dfs.namenode.shared.edits.dir</name>
                <value>qjournal://master1:8485;master2:8485;slave1:8485;slave2:8485;slave3:8485/cluster</value>
        </property>
        <property>
                <name>dfs.client.failover.proxy.provider.cluster</name>
                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <property>
                <name>dfs.journalnode.edits.dir</name>
                <value>/soft/hadoop-2.6.0/mydata/journal</value>
        </property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>  
       <name>dfs.namenode.name.dir</name>  
       <value>file:/soft/hadoop-2.6.0/mydata/name</value>  
</property>  
<property>  
    <name>dfs.datanode.data.dir</name>  
    <value>file:/soft/hadoop-2.6.0/mydata/data</value>  
</property>  
<property>
       <name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>  
<name>dfs.webhdfs.enabled</name>  
<value>true</value>
</property>
<property>  
<name>dfs.journalnode.http-address</name>  
<value>0.0.0.0:8480</value>  
</property>  
<property>  
<name>dfs.journalnode.rpc-address</name>  
<value>0.0.0.0:8485</value>  
</property>
<property>    
<name>dfs.permissions</name>    
<value>false</value>    
</property>
</configuration>


8.5可以在配置文件中查找mapredsite.xml,yarn-site.xml,修改HADOOPHOME/etc/hadoop/mapredsite.xml文件


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->


<!-- Put site-specific property overrides in this file. -->


<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master1:19888</value>
</property>
</configuration>


修改 HADOOP_HOME/etc/hadoop/yarn-site.xml文件yarn.resourcemanager.ha.id的属性值在master2机器中需要更改为rm2


<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>master1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>master2</value>
</property>
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
</property>
<property>
                 <name>yarn.resourcemanager.address.rm1</name>
                 <value>${yarn.resourcemanager.hostname.rm1}:8032</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.scheduler.address.rm1</name>
                 <value>${yarn.resourcemanager.hostname.rm1}:8030</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.webapp.https.address.rm1</name>
                 <value>${yarn.resourcemanager.hostname.rm1}:8089</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.webapp.address.rm1</name>
                 <value>${yarn.resourcemanager.hostname.rm1}:8088</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
                 <value>${yarn.resourcemanager.hostname.rm1}:8025</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.admin.address.rm1</name>
                 <value>${yarn.resourcemanager.hostname.rm1}:8041</value>
         </property>


         <property>
                 <name>yarn.resourcemanager.address.rm2</name>
                 <value>${yarn.resourcemanager.hostname.rm2}:8032</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.scheduler.address.rm2</name>
                 <value>${yarn.resourcemanager.hostname.rm2}:8030</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.webapp.https.address.rm2</name>
                 <value>${yarn.resourcemanager.hostname.rm2}:8089</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.webapp.address.rm2</name>
                 <value>${yarn.resourcemanager.hostname.rm2}:8088</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
                 <value>${yarn.resourcemanager.hostname.rm2}:8025</value>
         </property>
         <property>
                 <name>yarn.resourcemanager.admin.address.rm2</name>
                 <value>${yarn.resourcemanager.hostname.rm2}:8041</value>
         </property>


<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property> 
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
    <value>/soft/hadoop-2.6.0/mydata/yarn/local</value>
</property>
<property>
                <name>yarn.nodemanager.log-dirs</name>
                <value>/soft/hadoop-2.6.0/mydata/yarn/log</value>
        </property>
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property> 
<property> 
<name>yarn.resourcemanager.zk-state-store.address</name> 
  <value>master1:2181,master2:2181,slave1:2181,slave2:2181,slave3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>master1:2181,master2:2181,slave1:2181,slave2:2181,slave3:2181</value>
</property>
<property> 
  <name>yarn.resourcemanager.store.class</name> 
  <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> 
</property> 
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster</value>
</property> 
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/soft/hadoop-2.6.0/etc/hadoop/fairscheduler.xml</value>
</property>


</configuration>


8.6 添加$HADOOP_HOME/etc/hadoop/fairscheduler.xml文件


<?xml version="1.0"?>
<allocations>
         <queue name="news">
                 <minResources>1024 mb, 1 vcores </minResources>
                 <maxResources>1536 mb, 1 vcores </maxResources>
                 <maxRunningApps>5</maxRunningApps>
                 <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
                 <weight>1.0</weight>
                 <aclSubmitApps>root,yarn,search,hdfs</aclSubmitApps>
         </queue>
         <queue name="crawler">
                 <minResources>1024 mb, 1 vcores</minResources>
                 <maxResources>1536 mb, 1 vcores</maxResources>
         </queue>
         <queue name="map">
                 <minResources>1024 mb, 1 vcores</minResources>
                 <maxResources>1536 mb, 1 vcores</maxResources>
         </queue>
</allocations>



8.9 创建相关文件夹,根据xml配置文件,建立相应的文件夹,举例(参考图列):


  配置完成后效果如下:


此时Hadoop+HA配置文件已经配好,就差ssh免密码登录+格式化Hadoop系统。
等我们装完所有软件(Zookeeper+hbase),克隆机器后再进行ssh免密码登录及Hadoop格式化。克隆后还需要更改每个节点的/etc/sysconfig/network中的hostname,以及更改master2中$HADOOP_HOME/etc/hadoop/yarn-site.xml文件的yarn.resourcemanager.ha.id属性值为rm2,(其他节点不用修改)


注意:此篇为大数据知识体系开章篇,即将推出实际项目中所应用到与大数据集群相关知识教程,敬请期待!


阅读全文
1 0