CentOS 7.0+hadoop 2.7搭建集群

来源:互联网 发布:一次性餐具的危害数据 编辑:程序博客网 时间:2024/05/23 00:08

准备工作

准备环境

下载链接:

VM12 : https://pan.baidu.com/s/1hsvkHe8 密码:ycan

CentOS 7.0:https://pan.baidu.com/s/1nvUmu05 密码:ktqh

jdk1.8: https://pan.baidu.com/s/1bo69W67 密码:3vol

hadoop2.7.3: https://pan.baidu.com/s/1qYiJgT2 密码:d96k

准备三台机器,一master 两slave

    192.168.122.128 master    192.168.122.129 slave1    192.168.122.130 slave2

在三台机器的host文件中,添加ip和主机名

vi /etc/hosts

免密登录(root身份,也可以新建hadoop用户)

  1. 检查时间一致性 时间同步

    date

    若时间不同步,参考 CentOS 7 中使用NTP进行时间同步

  2. 关闭防火墙

    systemctl stop firewalld.service
  3. 关闭开机自启动

    systemctl disable firewalld.service
  4. 生成ssh秘钥

    ssh-keygen  -t   rsa   -P  ''

    查看秘钥是否生成成功 (有两个文件 id_rsa 和 id_rsa.pub)

    ls    /root/.ssh/
  5. 创建authorized_keys文件

    1. 在master上生成一个authorized_keys文件并查看文件是否成功生成

      touch  /root/.ssh/authorized_keysls   /root/.ssh/
    2. 将三台机器的id_rsa.pub文件中的内容拷贝到authorized_keys中

      例如此次实验中生成的RSA如下:

      ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDuFQVKFAn/hodZUwj9xSybiKIP2XJHqxZfgNz7ADSFwEORMAkKpUyJ51fM/S+uW9oNSlSG/TyfvSSme1fE6xut0t4iPLV2dp9Ia+dPDs9ub3XEyvId1ADMNhO3SveuMVNPpJ50PiBnmqgHQ1OuPMopgfgRFWmbodLmz0gtmJZ6KubI3P90Do44X1+TJdX+eRECFomefayj23x+/xBVLxKXQH7+vNVn4vIM8JIWFFT8XEBN2+HKxCqEN4yilTIk+X5Ov10sfJcQhNlivThV+t9AeBH/T6J4bLOrdiQYNTnMuN+Ii5tNc7fKpzdaCmlJmzaxzESrXQRtu+7C3areZYe9 root@masterssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDMQ8YA7APRnS5Rt3y5OIJexL2A6a0KR+MhLpMInnGMzEcpItryIU8FBPV3fmsKdhzr99pxryLSxQibvHxQo1Kx2FUN1HUTW4fftZsum+ddGY6w+/iQefjbddrmUzaZUxhsHuqCBb80UclfbR7BcRv1FQDelyig2FU9U28LjU9iTvwdEzttdBq433GL/2lDC1xw2tidWkc0CfjACprzjJ16vzb88awm8VOTp5ExylD7gT8sXmAsmAr3W8FsilKFKCrLwCEop3/r+6g8eIDM53XOt7UciK/FJAyCarKbUexeEfBqpzeilW1wcHd/5DiLJgCZ2fJhJnI+3xQKGv9xdYoR root@slave1ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDKuppRgfn5Tx7ST3C17jfguukTaaJVJdWbFziVm/jbwU57o4CmN7CuTzI4VvEnVeVsKTN8S5+rxC3hBIMjuJbVopR8vjHLSd7ysByUiFUusg7RPmJRMlZ0LwWMJCUm9E/xIoq9zNGr38u0yKNjS27PYf8PLgYQx2qHUGbla3KlSX5i81hxyeF/sHqfn6F+RQ/BAxVziu7atDTZF+RojYfiw087Zp/57Th6ouSPIeObTeYkJjFFENavsCcDwbqUnMyndDoPbCqV/f0494HSFZWPX8KUVfWnnJ1HQWp37vgZV8uU59OMLibYCD6t/p4Qfvp0/CCgFW8a6XoYXwYcm/tl root@slave2
    3. 将master中的authorized_keys拷贝到slave1和slave2中

      scp /root/.ssh/authorized_keys root@192.168.122.129:/root/.sshscp /root/.ssh/authorized_keys root@192.168.122.130:/root/.ssh
    4. 检查是否成功拷贝

    5. 测试是否能成功互相免密登录 (exit 登出)

      ssh masterssh slave1ssh slave2

安装jdk

1. 将本机的jdk文件拷贝到集群中

使用SecureCRT连接集群,在终端输入rz,选择上传的jdk文件

本例中,新建/opt/java文件夹,拷贝到java文件夹中。

2. tar解压jdk文件

cd /opt/java tar -zxvf jdk-8u60-linux-x64.tar.gz ls 

3. 修改配置文件 /etc/profile

vi /etc/profile

编辑添加java环境

export JAVA_HOME=/opt/java/jdk1.8.0_60export CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/export PATH=$PATH:$JAVA_HOME/bin

使编辑文件生效

source /etc/profile

查看java版本

java -version

4. 补充 (可能出现问题:环境不能生效)

由于版权问题,大部分Linux发行版都默认安装了OpenJDK,并且OpenJDK的java命令加入到环境变量中,所以我们安装完SunJDK后,系统中存在两个jdk环境——OpenJDK和SunJDK。如何将SunJDK替换原来的OpenJDK呢?

  1. 原因查找

    查看java命令所在目录

    whereis java

    输出:

    java: /usr/bin/java /usr/lib/java /etc/java /usr/share/java /opt/java/jdk1.8.0_60/bin/java /usr/share/man/man1/java.1.gz

    其中/opt/java/jdk1.8.0_60/bin/java 是我们安装的SunJDK,/usr/bin/java 是系统默认安装的java命令所在的目录,接着往下看

    ls -la /usr/bin/java

    输出:

    lrwxrwxrwx. 1 root root 22 1123 23:44 /usr/bin/java -> /etc/alternatives/java

    进入到/etc/alternatives 目录中

    cd /etc/alternativesls -la

    输出(其中一条):

    lrwxrwxrwx.   1 root root   70 1123 23:44 java -> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre/bin/java

    原因就在这:系统默认的java指向到OpenJDK中的java命令,所以导致我们再/etc/profile配置的环境变量不能生效。
    接下来要做的就是将这个java软链接指定到我们的SunJDK的目录/opt/java/jdk1.8.0_60/bin/java 中。

  2. 更改java软链接路径,解决问题

    查看当前的java默认配置

    update-alternatives --display java

    输出:(其中一部分)

    java - 状态为自动。链接当前指向 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre/bin/java/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.91-2.6.2.3.el7.x86_64/jre/bin/java - 优先度 1700091

    从中可以看到系统默认使用的java是OpenJDK的(注意优先度)

  3. 配置成我们安装的SunJDK

    update-alternatives  --install /usr/bin/java java /opt/java/jdk1.8.0_60/bin/java 170130update-alternatives --config java

    输出:

    共有 3 个提供“java”的程序。 选项    命令<hr />  1           /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.91-2.6.2.3.el7.x86_64/jre/bin/java*+ 2           /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre/bin/java  3           /opt/java/jdk1.8.0_60/bin/java按 Enter 保留当前选项[+],或者键入选项编号:

    编号选择我们安装的jdk,本例中是3

    由于我们配置的优先度比OpenJDK的优先度低,需要手动选择,如果配置的优先度比OpenJDK的优先度高,可以无需手动选择,系统自动回选择优先度高的作为默认的alternative。

    查看java版本

    java -version

安装hadoop

1. 将本机的hadoop拷贝到集群(见jdk拷贝方法)

本例中,新建/opt/hadoop目录

2. tar解压

3.配置环境

将hadoop的环境加入到/ect/profile

export HADOOP_HOME=/opt/hadoop/hadoop-2.7.3export PATH=PATH:JAVA_HOME/bin:$HADOOP_HOME/bin

4. 新建hadoop相关目录

mkdir  /root/hadoop  mkdir  /root/hadoop/tmp  mkdir  /root/hadoop/var  mkdir  /root/hadoop/dfs  mkdir  /root/hadoop/dfs/name  mkdir  /root/hadoop/dfs/data  

5. 修改hadoop目录下etc/hadoop 的一系列配置文件

本例中,目录为/opt/hadoop/hadoop-2.7.3/etc/hadoop

  1. 修改core-site.xml

    在标签中添加:

    <configuration><property>     <name>hadoop.tmp.dir</name>     <value>/root/hadoop/tmp</value>     <description>Abase for other temporary directories.</description>    </property>   <property>     <name>fs.default.name</name>     <value>hdfs://master:9000</value>    </property></configuration>

  2. 修改hadoop-env.sh

    export   JAVA_HOME=${JAVA_HOME}

    改为:

    export JAVA_HOME=/opt/java/jdk1.8.0_60

    路径为安装的jdk路径

  3. 修改hdfs-site.xml

    在标签中添加:

    <property>  <name>dfs.name.dir</name>  <value>/root/hadoop/dfs/name</value>  <description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description></property><property>  <name>dfs.data.dir</name>  <value>/root/hadoop/dfs/data</value>  <description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description></property><property>  <name>dfs.replication</name>  <value>2</value></property><property>     <name>dfs.permissions</name>     <value>true</value>     <description>need not permissions</description></property>

    说明:dfs.permissions配置为false后,可以允许不要检查权限就生成dfs上的文件,方便倒是方便了,但是你需要防止误删除,请将它设置为true,或者直接将该property节点删除,因为默认就是true。

  4. 新建并且修改mapred-site.xml

    将mapred-site.xml.template模板拷贝出一份

    cp mapred-site.xml.template mapred-site.xmlls

    修改

    vi mapred-site.xml

    在标签中添加:

    <property>   <name>mapred.job.tracker</name>   <value>master:49001</value></property><property>     <name>mapred.local.dir</name>      <value>/root/hadoop/var</value></property><property>      <name>mapreduce.framework.name</name>      <value>yarn</value></property>

  5. 修改slaves文件

    将里面的localhost替换为:

    slave1slave2

  6. 修改yarn-site.xml文件

    在标签中添加:

    <property>       <name>yarn.resourcemanager.hostname</name>       <value>master</value>  </property>  <property>       <description>The address of the applications manager interface in the RM.</description>       <name>yarn.resourcemanager.address</name>       <value>${yarn.resourcemanager.hostname}:8032</value>  </property>  <property>       <description>The address of the scheduler interface.</description>       <name>yarn.resourcemanager.scheduler.address</name>       <value>${yarn.resourcemanager.hostname}:8030</value>  </property>  <property>       <description>The http address of the RM web application.</description>       <name>yarn.resourcemanager.webapp.address</name>       <value>${yarn.resourcemanager.hostname}:8088</value>  </property>  <property>       <description>The https adddress of the RM web application.</description>       <name>yarn.resourcemanager.webapp.https.address</name>       <value>${yarn.resourcemanager.hostname}:8090</value>  </property>  <property>       <name>yarn.resourcemanager.resource-tracker.address</name>       <value>${yarn.resourcemanager.hostname}:8031</value>  </property>  <property>       <description>The address of the RM admin interface.</description>       <name>yarn.resourcemanager.admin.address</name>       <value>${yarn.resourcemanager.hostname}:8033</value>  </property>  <property>       <name>yarn.nodemanager.aux-services</name>       <value>mapreduce_shuffle</value>  </property>  <property>       <name>yarn.scheduler.maximum-allocation-mb</name>       <value>2048</value>       <discription>每个节点可用内存,单位MB,默认8182MB</discription>  </property>  <property>       <name>yarn.nodemanager.vmem-pmem-ratio</name>       <value>2.1</value>  </property>  <property>       <name>yarn.nodemanager.resource.memory-mb</name>       <value>2048</value></property>  <property>       <name>yarn.nodemanager.vmem-check-enabled</name>       <value>false</value></property>

    说明:yarn.nodemanager.vmem-check-enabled这个的意思是忽略虚拟内存的检查,如果你是安装在虚拟机上,这个配置很有用,配上去之后后续操作不容易出问题。如果是实体机上,并且内存够多,可以将这个配置去掉。

启动hadoop

1. 在namenode上执行初始化

由于master节点是namenode,slave1和slave2是datanode节点;所以只需要在namenode上执行初始化,即对hdfs执行格式化

cd /opt/hadoop/hadoop-2.7.3/bin./hadoop namenode -format

由于之前配过hadoop环境,可直接

hadoop namenode -format

一系列代码滚动完成后,可以看到一些配置信息;查看/root/hadoop/dfs/name 目录下是否多了一个current文件夹,里面有几个文件。

cd /root/hadoop/dfs/namelscd currentls

2. 在namenode上执行启动命令

执行启动命令:

cd /opt/hadoop/hadoop-2.7.3/sbin/./start-all.sh

测试hadoop

打开网址

http://192.168.122.128:50070http://192.168.122.128:8088

能够跳转到hadoop的页面

至此,整个hadoop环境搭建成功。

WordCount 测试

本地写两个文件

cd /optmkdir filecd fileecho "hello world" >> file1.txtecho "hello hadoop" >> file2.txtlsmore file1.txtmore file2.txt

hadoop集群中创建输入输出文件夹

hadoop fs -mkdir -p /test/hadoop/input

上传文件到集群

hadoop fs -put /opt/file/file*.txt /test/hadoop/inputhadoop fs -ls /test/hadoop/input

调用MapReduce测试

hadoop jar /opt/hadoop/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /test/hadoop/input /test/hadoop/output

说明: 调用share目录下mapreduce的jar包中wordcount方法,后面两个参数分别是输入和输出

结果:

17/11/29 14:33:01 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.122.128:803217/11/29 14:33:03 INFO input.FileInputFormat: Total input paths to process : 217/11/29 14:33:03 INFO mapreduce.JobSubmitter: number of splits:217/11/29 14:33:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1511935947803_000317/11/29 14:33:04 INFO impl.YarnClientImpl: Submitted application application_1511935947803_000317/11/29 14:33:04 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1511935947803_0003/17/11/29 14:33:04 INFO mapreduce.Job: Running job: job_1511935947803_000317/11/29 14:33:14 INFO mapreduce.Job: Job job_1511935947803_0003 running in uber mode : false17/11/29 14:33:14 INFO mapreduce.Job:  map 0% reduce 0%17/11/29 14:33:29 INFO mapreduce.Job:  map 50% reduce 0%17/11/29 14:33:30 INFO mapreduce.Job:  map 100% reduce 0%17/11/29 14:33:38 INFO mapreduce.Job:  map 100% reduce 100%17/11/29 14:33:38 INFO mapreduce.Job: Job job_1511935947803_0003 completed successfully17/11/29 14:33:39 INFO mapreduce.Job: Counters: 49        File System Counters                FILE: Number of bytes read=55                FILE: Number of bytes written=355783                FILE: Number of read operations=0                FILE: Number of large read operations=0                FILE: Number of write operations=0                HDFS: Number of bytes read=247                HDFS: Number of bytes written=25                HDFS: Number of read operations=9                HDFS: Number of large read operations=0                HDFS: Number of write operations=2        Job Counters                 Launched map tasks=2                Launched reduce tasks=1                Data-local map tasks=2                Total time spent by all maps in occupied slots (ms)=27082                Total time spent by all reduces in occupied slots (ms)=6022                Total time spent by all map tasks (ms)=27082                Total time spent by all reduce tasks (ms)=6022                Total vcore-milliseconds taken by all map tasks=27082                Total vcore-milliseconds taken by all reduce tasks=6022                Total megabyte-milliseconds taken by all map tasks=27731968                Total megabyte-milliseconds taken by all reduce tasks=6166528        Map-Reduce Framework                Map input records=2                Map output records=4                Map output bytes=41                Map output materialized bytes=61                Input split bytes=222                Combine input records=4                Combine output records=4                Reduce input groups=3                Reduce shuffle bytes=61                Reduce input records=4                Reduce output records=3                Spilled Records=8                Shuffled Maps =2                Failed Shuffles=0                Merged Map outputs=2                GC time elapsed (ms)=1132                CPU time spent (ms)=4410                Physical memory (bytes) snapshot=500752384                Virtual memory (bytes) snapshot=6232047616                Total committed heap usage (bytes)=307437568        Shuffle Errors                BAD_ID=0                CONNECTION=0                IO_ERROR=0                WRONG_LENGTH=0                WRONG_MAP=0                WRONG_REDUCE=0        File Input Format Counters                 Bytes Read=25        File Output Format Counters                 Bytes Written=25

JobId: Running job: job15119359478030003

测试结果

hadoop fs -ls /test/hadoop/output
Found 2 items-rw-r--r--   2 root supergroup          0 2017-11-29 14:33 /test/hadoop/output/_SUCCESS-rw-r--r--   2 root supergroup         25 2017-11-29 14:33 /test/hadoop/output/part-r-00000
hadoop fs -cat /test/hadoop/output/part-r-00000
hadoop  1hello   2world   1

另外,可以通过http://192.168.122.128:8088/cluster/app看到提交的任务执行结果。

至此,wordcount测试成功

遇到的问题

测试网页打不开

搭建完成后,发现测试网页打不开,以为是配置出现问题,排查了好长时间,最后不得已问小伙伴是怎么回事,“你代理取消了吗?”,结果取消代理,测试成功,心情舒畅中带着崩溃;或者在代理中,将集群ip格式添加到不使用代理规则中。

hadoop: 未找到命令…

将hadoop的环境加入到/ect/profile

export HADOOP_HOME=/opt/hadoop/hadoop-2.7.3export  PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

参考:Linux上安装Hadoop集群(CentOS7+hadoop-2.8.0)

linux下如何使用自己安装的SunJDK替换默认的OpenJDK

原创粉丝点击