大数据学习[02]:hadoop安装配置

来源:互联网 发布:国密算法介绍.pdf 编辑:程序博客网 时间:2024/05/17 02:54

hadoop

摘要:
主要基于三台机器之上的hadoop2.7.3的下载、安装,及相关参数配置,所遇问题,Demo等。其中配置,包含hadoop运行环境,yarn运行环境配置,目的是搭建成基于yarn之上的RM运行环境,另外,也对资源限制的情况下作了一个示范性的设置。

前置

  1. 有一个局域网集群,例如在虚拟机上搭建的那样[1]大数据学习前夕[01]:系统-网络-SSH
  2. 安装好JDK,及环境变量配置好;例如[2]大数据学习前夕[02]:JDK安装升级

下载

wget  http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

解压

[hadoop@hadoop01 ~]$ tar -zxvf hadoop-2.7.3.tar.gz[hadoop@hadoop01 ~]$ mv hadoop-2.7.3 hadoop

新文件目录

mkdir dfs  mkdir dfs/name  mkdir dfs/data  

配置

1. 修改hadoop-env.sh

[hadoop@hadoop01 ~]$ vim hadoop/etc/hadoop/hadoop-env.sh

hadoop
hadoop

2. yarn-env.sh

[hadoop@hadoop01 hadoop]$ vim /home/hadoop/hadoop/etc/hadoop/yarn-env.sh

配置JAVA_HOME,打开那个注解
export JAVA_HOME=/home/hadoop/jdk1.8.0_144

3. slaves

这里把hadoop03与hadoop02作为slaves,也可以把hadoop01加入来。

[hadoop@hadoop01 hadoop]$ vim /home/hadoop/hadoop/etc/hadoop/slaves

hadoop

4. core-site.xml

[hadoop@hadoop01 hadoop]$ vim /home/hadoop/hadoop/etc/hadoop/core-site.xml
<configuration>     <property>        <name>fs.defaultFS</name>        <value>hdfs://haoop01:9000</value>    </property>    <property>        <name>io.file.buffer.size</name>        <value>131072</value>    </property>    <property>        <name>hadoop.tmp.dir</name>        <value>file:/home/hadoop/hadoop/tmp</value>        <description>Abase for other temporary   directories.</description>    </property></configuration>

5. hdfs-site.xml

[hadoop@hadoop01 hadoop]$ vim /home/hadoop/hadoop/etc/hadoop/hdfs-site.xml
<configuration>         <property>                 <name>dfs.namenode.secondary.http-address</name>                 <value>hadoop01:9001</value>         </property>       <property>               <name>dfs.namenode.name.dir</name>               <value>file:/home/hadoop/hadoop/dfs/name</value>         </property>        <property>                <name>dfs.datanode.data.dir</name>                <value>file:/home/hadoop/hadoop/dfs/data</value>         </property>         <property>                 <name>dfs.replication</name>                 <value>2</value>          </property>          <property>                    <name>dfs.webhdfs.enabled</name>                    <value>true</value>           </property>  </configuration>  

6. mapred-site.xml


MR运行需要设置一下它的内存限制,要不跑MR程序时会卡死,后面的yarn也会设置这个内存及CPU的资源问题。

[hadoop@hadoop01 hadoop]$  vim /home/hadoop/hadoop/etc/hadoop/mapred-site.xml
<configuration>            <property>                                                                            <name>mapreduce.framework.name</name>                  <value>yarn</value>             </property>            <property>                    <name>mapreduce.jobhistory.address</name>                    <value>haooop01:10020</value>            </property>            <property>                  <name>mapreduce.jobhistory.webapp.address</name>                  <value>hadoop01:19888</value>         </property>     <!-- 设置mr运行内存-->    <property>        <name>mapreduce.reduce.memory.mb</name>        <value>512</value>    </property>    <property>        <name>mapreduce.map.memory.mb</name>        <value>512</value>    </property></configuration>  

7. yarn-site.xml

[hadoop@hadoop01 hadoop]$  vim /home/hadoop/hadoop/etc/hadoop/yarn-site.xml


这里要包含设置内存及CPU的资源选项,除非机器的配置大过默认配置

<configuration>          <property>                 <name>yarn.nodemanager.aux-services</name>                 <value>mapreduce_shuffle</value>          </property>          <property>                                                                                 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>                 <value>org.apache.hadoop.mapred.ShuffleHandler</value>          </property>          <property>                 <name>yarn.resourcemanager.address</name>                 <value>hadoop01:8032</value>         </property>         <property>                 <name>yarn.resourcemanager.scheduler.address</name>                 <value>hadoop01:8030</value>         </property>         <property>              <name>yarn.resourcemanager.resource-tracker.address</name>               <value>hadoop01:8031</value>        </property>        <property>                <name>yarn.resourcemanager.admin.address</name>                 <value>hadoop01:8033</value>         </property>         <property>                 <name>yarn.resourcemanager.webapp.address</name>                 <value>hadoop01:8088</value>         </property>      <property>          <name>yarn.nodemanager.vmem-check-enabled</name>          <value>false</value>      </property>     <!--单个任务可申请的最少物理内存量,默认是2048(MB) -->    <property>        <name>yarn.scheduler.minimum-allocation-mb</name>        <value>256</value>    </property>       <!--单个任务可申请的最多物理内存量,默认是8192(MB) -->    <property>        <name>yarn.scheduler.maximum-allocation-mb</name>        <value>512</value>    </property>       <property>        <name>yarn.app.mapreduce.am.resource.mb</name>        <value>512</value>        <description>The amount of memory the MR AppMaster needs.</description>    </property>    <property>        <name>yarn.app.mapreduce.am.resource.cpu-vcores</name>        <value>1</value>    </property>      <property>        <name>yarn.nodemanager.resource.memory-mb</name>        <value>512</value>    </property>    <property>        <name>yarn.nodemanager.resource.cpu-vcores</name>        <value>1</value>    </property></configuration>  

8.复制配置文件

配置完之后,复制到基它的集群机器:

[hadoop@hadoop01 ~]$ scp -r hadoop hadoop@hadoop02:~/[hadoop@hadoop01 ~]$ scp -r hadoop hadoop@hadoop03:~/

9.Hadoop环境变量

[hadoop@hadoop01 ~]$ sudo vim /etc/profile#hadoop  export HADOOP_HOME=/home/hadoop/hadoop export PATH=$PATH:$HADOOP_HOME/sbin  export PATH=$PATH:$HADOOP_HOME/bin [hadoop@hadoop01 ~]$ source /etc/profile

格式化

[hadoop@hadoop01 hadoop]$ bin/hdfs namenode -format

启动

[hadoop@hadoop01 hadoop]$ sbin/start-all.shThis script is Deprecated. Instead use start-dfs.sh and start-yarn.shStarting namenodes on [hadoop01]hadoop01: starting namenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-namenode-hadoop01.outhadoop02: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-hadoop02.outhadoop03: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-hadoop03.outStarting secondary namenodes [hadoop01]hadoop01: starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-hadoop01.outstarting yarn daemonsstarting resourcemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-hadoop01.outhadoop03: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-hadoop03.outhadoop02: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-hadoop02.out

hadoop01
hadoop
hadoop02
hadoop
hadoop03
hadoop

运行Demo

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /data/input /data/output/result

web上查看
hadoop
结果
hadoop
提示:上面配置时,如果没有加入资源的限制,会一直卡在这里,不会运行;如果配置得不够会报错,例如
hadoop
原因是程序使用的程序已经超过了内存,还要设置一个参数:
在yarn-site.xml文件中加入[前面已加]

<property>        <name>yarn.nodemanager.vmem-check-enabled</name>        <value>false</value>  </property> 

可能出现的基它异常

问题1,当访问hadoop文件系统时出现:

ls: Call From localhost/127.0.0.1 to hadoop01:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

问题2,当向fs上传文件时,即put操作时:
hadoop
出这些问题的原因会比较多,解决方法1:关闭防火墙

[hadoop@hadoop01 sbin]$ sudo service iptables stop[hadoop@hadoop01 sbin]$ sudo  chkconfig iptables off

解决方法2:重新格式化
把hadoop/tmp全删除了,格式化 hadoop namenode -format
只启动:start-dfs.sh
查看报告

[hadoop@hadoop01 sbin]$ hadoop dfsadmin -report

hadoop
或采用WEB查看:
http://192.168.137.101:50070/dfshealth.html#tab-overview
hadoop

后记

配置到这里,hadoop完成了,只要是两个东西,一个是文件系统,一个MR分布式程序,这个配置过程中,用到了yarn来管理MR,可以解决MR存在大部分问题。

参考引用

[1] 大数据学习前夕[01]:系统-网络-SSH
[2] 大数据学习前夕[02]:JDK安装升级

【作者:happyprince】

原创粉丝点击