Hadoop2+HA+YARN环境搭建

来源:互联网 发布:大数据系统架构师 编辑:程序博客网 时间:2024/03/29 23:52

一、前期准备四台机子,配置如下:

[root@datanode1 usr]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
192.168.182.129         hadoop
192.168.182.128         datanode1
192.168.182.130         datanode2
192.168.182.131         namenode2

二、安装jdk

1、将安装介质上传到指定目录后解压后存放目录:/usr/program/java/jdk1.6.0_24

2、配置环境变量:(将以下语句添加在/etc/profile中)

export JAVA_HOME=/usr/program/java/jdk1.6.0_24

export HADOOP_HOME=/usr/alan/hadoop_cdh5

export CLASSPATH=.:%JAVA_HOME%/lib/dt.jar:%JAVA_HOME%/lib/tools.jar:$CLASSPATH

export  PATH=.:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

三、上传hadoop2的安装介质(我这里的版本是cdh5),解压后存放目录:/usr/alan/hadoop_cdh5

进入目录:/usr/alan/hadoop_cdh5/etc/hadoop 配置以下几个文件:

1、在hadoop-env.sh文件中添加:

export JAVA_HOME=/usr/program/java/jdk1.6.0_24

2、在mapred-site.xml中添加:

<configuration>
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
  <description>The runtime framework for executing MapReduce jobs.
  Can be one of local, classic or yarn.
  </description>
</property>
<!-- jobhistory properties -->
<property>
  <name>mapreduce.jobhistory.address</name>
  <value>namenode2:10020</value>
  <description>MapReduce JobHistory Server IPC host:port</description>
</property>
<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>namenode2:19888</value>
  <description>MapReduce JobHistory Server Web UI host:port</description>
</property>

</configuration>

3、在core-site.xml中添加:

<configuration>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://hadoop:8020</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

</configuration>

4、在hdfs-site.xml中添加:

<configuration>
<property>
  <name>dfs.nameservices</name>
  <value>hadoop-test</value>
  <description>
    Comma-separated list of nameservices.
  </description>
</property>
<property>
  <name>dfs.ha.namenodes.hadoop-test</name>
  <value>nn1,nn2</value>
  <description>
    The prefix for a given nameservice, contains a comma-separated
    list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
  </description>
</property>
<property>
  <name>dfs.namenode.rpc-address.hadoop-test.nn1</name>
  <value>hadoop:8020</value>
  <description>
    RPC address for nomenode1 of hadoop-test
  </description>
</property>
<property>
  <name>dfs.namenode.rpc-address.hadoop-test.nn2</name>
  <value>namenode2:8020</value>
  <description>
    RPC address for nomenode2 of hadoop-test
  </description>
</property>
<property>
  <name>dfs.namenode.http-address.hadoop-test.nn1</name>
  <value>hadoop:50070</value>
  <description>
    The address and the base port where the dfs namenode1 web ui will listen on.
  </description>
</property>
<property>
  <name>dfs.namenode.http-address.hadoop-test.nn2</name>
  <value>namenode2:50070</value>
  <description>
    The address and the base port where the dfs namenode2 web ui will listen on.
  </description>
</property>
<property>
  <name>dfs.namenode.name.dir</name>
  <value>file:///home/alan/hadoop/hdfs/name</value>
  <description>Determines where on the local filesystem the DFS name node
      should store the name table(fsimage).  If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. </description>
</property>
<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://namenode2:8485;datanode1:8485;datanode2:8485/hadoop-test</value>
  <description>A directory on shared storage between the multiple namenodes
  in an HA cluster. This directory will be written by the active and read
  by the standby in order to keep the namespaces synchronized. This directory
  does not need to be listed in dfs.namenode.edits.dir above. It should be
  left empty in a non-HA cluster.
  </description>
</property>
<property>
  <name>dfs.datanode.data.dir</name>
  <value>file:///home/alan/hadoop/hdfs/data</value>
  <description>Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices.
  Directories that do not exist are ignored.
  </description>

</property>
<property>
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>false</value>
  <description>
    Whether automatic failover is enabled. See the HDFS High
    Availability documentation for details on automatic HA
    configuration.
  </description>
</property>
<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/home/alan/hadoop/hdfs/journal/</value>
</property>

</configuration>

注释:

dfs.nameservices
集群中命名服务列表(自定义)
dfs.ha.namenodes.${ns}
命名服务中的namenode逻辑名称(自定义)
dfs.namenode.rpc-address.${ns}.${nn}
命名服务中逻辑名称对应的RPC地址
dfs.namenode.http-address..${ns}.${nn}
命名服务中逻辑名称对应的HTTP地址

dfs.namenode.name.dir
NameNode fsiamge存放目录
dfs.namenode.shared.edits.dir
主备NameNode同步元信息的共享存储系统
dfs.journalnode.edits.dir
Journal Node数据存放目录

5、在yarn-site.xml中添加:

<configuration>


<property>
    <description>The hostname of the RM.</description>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop</value>
  </property>    
  
  <property>
    <description>The address of the applications manager interface in the RM.</description>
    <name>yarn.resourcemanager.address</name>
    <value>${yarn.resourcemanager.hostname}:8032</value>
  </property>


  <property>
    <description>The address of the scheduler interface.</description>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>${yarn.resourcemanager.hostname}:8030</value>
  </property>


  <property>
    <description>The http address of the RM web application.</description>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>${yarn.resourcemanager.hostname}:8088</value>
  </property>


  <property>
    <description>The https adddress of the RM web application.</description>
    <name>yarn.resourcemanager.webapp.https.address</name>
    <value>${yarn.resourcemanager.hostname}:8090</value>
  </property>


  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>${yarn.resourcemanager.hostname}:8031</value>
  </property>


  <property>
    <description>The address of the RM admin interface.</description>
    <name>yarn.resourcemanager.admin.address</name>
    <value>${yarn.resourcemanager.hostname}:8033</value>
  </property>


  <property>
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  </property>


  <property>
    <description>fair-scheduler conf location</description>
    <name>yarn.scheduler.fair.allocation.file</name>
    <value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value>
  </property>


  <property>
    <description>List of directories to store localized files in. An 
      application's localized file directory will be found in:
      ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
      Individual containers' work directories, called container_${contid}, will
      be subdirectories of this.
   </description>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/home/alan/hadoop/yarn/local</value>
  </property>


  <property>
    <description>Whether to enable log aggregation</description>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>


  <property>
    <description>Where to aggregate logs to.</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/tmp/logs</value>
  </property>


  <property>
    <description>Amount of physical memory, in MB, that can be allocated 
    for containers.</description>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>30720</value>
  </property>


  <property>
    <description>Number of CPU cores that can be allocated 
    for containers.</description>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>12</value>
  </property>


  <property>
    <description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>



</configuration>

6、在fairscheduler.xml中添加:

<?xml version="1.0"?>
<allocations>

  <queue name="infrastructure">
    <minResources>102400 mb, 50 vcores </minResources>
    <maxResources>153600 mb, 100 vcores </maxResources>
    <maxRunningApps>200</maxRunningApps>
    <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
    <weight>1.0</weight>
    <aclSubmitApps>root,yarn,search,hdfs</aclSubmitApps>
  </queue>


   <queue name="tool">
      <minResources>102400 mb, 30 vcores</minResources>
      <maxResources>153600 mb, 50 vcores</maxResources>
   </queue>


   <queue name="sentiment">
      <minResources>102400 mb, 30 vcores</minResources>
      <maxResources>153600 mb, 50 vcores</maxResources>
   </queue>


</allocations>

7、在slaves中添加:

namenode1
datanode1
datanode2

四、启动hadoop集群

注意:所有操作均在Hadoop部署目录下进行。
启动Hadoop集群:
Step1 :
在各个JournalNode节点上,输入以下命令启动journalnode服务:
sbin/hadoop-daemon.sh start journalnode


Step2:
在[nn1]上,对其进行格式化,并启动:
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode


Step3:
在[nn2]上,同步nn1的元数据信息:
bin/hdfs namenode -bootstrapStandby


Step4:
启动[nn2]:
sbin/hadoop-daemon.sh start namenode


经过以上四步操作,nn1和nn2均处理standby状态
Step5:
将[nn1]切换为Active
bin/hdfs haadmin -transitionToActive nn1


Step6:
在[nn1]上,启动所有datanode
sbin/hadoop-daemons.sh start datanode


关闭Hadoop集群:
在[nn1]上,输入以下命令
sbin/stop-dfs.sh

0 0
原创粉丝点击