hadoop学习笔记之完全分布模式安装

来源:互联网 发布:残兵屠龙熔炼数据 编辑:程序博客网 时间:2024/05/19 16:49

一、Hadoop是什么

Hadoop是一个由Apache基金会所开发的分布式系统基础架构,它是一个开发和运行处理大规模数据的软件平台,是Appach的一个用java语言实现开源软件框架,实现在大量计算机组成的集群中对海量数据进行分布式计算。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储。

Hadoop框架中最核心设计就是:HDFS和MapReduce.

  • HDFS提供了海量数据的存储
    • HDFS(Hadoop Distributed File System,Hadoop分布式文件系统),它是一个高度容错性的系统,适合部署在廉价的机器上。HDFS能提供高吞吐量的数据访问,适合那些有着超大数据集(large data set)的应用程序。
  • MapReduce提供了对数据的计算
    • 通俗说MapReduce是一套从海量·源数据提取分析元素最后返回结果集的编程模型,将文件分布式存储到硬盘是第一步,而从海量数据中提取分析我们需要的内容就是MapReduce做的事了。

总的来说Hadoop适合应用于大数据存储和大数据分析的应用,适合于服务器几千台到几万台的集群运行,支持PB级的存储容量。

Hadoop典型应用有:搜索、日志处理、推荐系统、数据分析、视频图像分析、数据保存等。

二、Hadoop模式介绍

单机模式:安装简单,几乎不用作任何配置,但仅限于调试用途

伪分布模式:在单节点上同时启动namenode、datanode、jobtracker、tasktracker、secondary namenode等5个进程,模拟分布式运行的各个节点

完全分布式模式:正常的Hadoop集群,由多个各司其职的节点构成

三、实验环境

系统:redhat6.5
软件版本:

  • hadoop-2.7.3
  • jdk-7u79-linux-x64
IP 主机名 角色 172.25.27.6 server6 NameNode DFSZKFailoverController ResourceManager 172.25.27.7 server7 NameNode DFSZKFailoverController ResourceManager 172.25.27.8 server8 JournalNode QuorumPeerMain DataNode NodeManager 172.25.27.9 server9 JournalNode QuorumPeerMain DataNode NodeManager 172.25.27.10 server10 JournalNode QuorumPeerMain DataNode NodeManager

四、安装步骤

1.下载Hadoop和jdk

hadoop 官网: http://hadoop.apache.org/
下载地址: http://mirror.bit.edu.cn/apache/hadoop/common/

2.hadoop 单节点 伪分布搭建

1.hadoop 安装与测试

[root@server6 ~]# useradd -u 1000 hadoop   ##id随意,需要注意的是所有节点id必须一致,所以需要合理选择避免冲突[root@server6 ~]# id hadoopuid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)[root@server6 ~]# su - hadoop##注意,下载后的包最好放在hadoop家目录,并且后续操作一定要切换成hadoop用户的身份进行相应操作[hadoop@server6 ~]$ lshadoop-2.7.3.tar.gz  jdk-7u79-linux-x64.tar.gz[hadoop@server6 ~]$ tar -zxf hadoop-2.7.3.tar.gz [hadoop@server6 ~]$ tar -zxf jdk-7u79-linux-x64.tar.gz [hadoop@server6 ~]$ ln -s hadoop-2.7.3 hadoop[hadoop@server6 ~]$ ln -s jdk1.7.0_79/ jdk[hadoop@server6 ~]$ vim ~/.bash_profile PATH=$PATH:$HOME/bin:/home/hadoop/jdk/binexport PATHexport JAVA_HOME=/home/hadoop/jdk[hadoop@server6 ~]$ source ~/.bash_profile[hadoop@server6 ~]$ echo $JAVA_HOME/home/hadoop/jdk[hadoop@server6 ~]$ cd hadoop[hadoop@server6 hadoop]$ mkdir input[hadoop@server6 hadoop]$ cp etc/hadoop/*.xml input/[hadoop@server6 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'[hadoop@server6 hadoop]$ ls output/part-r-00000  _SUCCESS[hadoop@server6 hadoop]$ cat output/*1   dfsadmin

这里写图片描述

[hadoop@server6 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount input output[hadoop@server6 hadoop]$ ls output/part-r-00000  _SUCCESS[hadoop@server6 hadoop]$ cat output/*      ##这个就能明显看出效果了"*" 18"AS 8"License"); 8"alice,bob  18...

2.伪分布式操作(需要ssh免密)

[hadoop@server6 hadoop]$ vim etc/hadoop/core-site.xml<configuration>    <property>            <name>fs.defaultFS</name>                    <value>hdfs://172.25.27.6:9000</value>                        </property></configuration>[hadoop@server6 hadoop]$ vim etc/hadoop/hdfs-site.xml <configuration>    <property>            <name>dfs.replication</name>                    <value>1</value>                        </property></configuration>[hadoop@server6 hadoop]$ sed -i.bak 's/localhost/172.25.27.6/g' etc/hadoop/slaves[hadoop@server6 hadoop]$ cat etc/hadoop/slaves172.25.27.6
  • ssh 免密
[hadoop@server6 hadoop]$ exitlogout[root@server6 ~]# passwd hadoopChanging password for user hadoop.New password: BAD PASSWORD: it is based on a dictionary wordBAD PASSWORD: is too simpleRetype new password: passwd: all authentication tokens updated successfully.[root@server6 ~]# su - hadoop[hadoop@server6 ~]$ ssh-keygen [hadoop@server6 ~]$ ssh-copy-id 172.25.27.6[hadoop@server6 ~]$ ssh 172.25.27.6 ##测试登陆,不需要输密码就ok[hadoop@server6 hadoop]$ bin/hdfs namenode -format     ##进行格式化[hadoop@server6 hadoop]$ sbin/start-dfs.sh     ##启动hadoop[hadoop@server6 hadoop]$ jps       ##用jps检验各后台进程是否成功启动,看到以下四个进程,就成功了2391 Jps2117 DataNode1994 NameNode2276 SecondaryNameNode

浏览器输入: http://172.25.27.6:50070
浏览网站界面;默认情况下它是可用的

这里写图片描述

3.伪分布的操作

Utilities –> Browse the file system

这里写图片描述

默认是空的,什么都没有

这里写图片描述

我们来创建一个文件夹

[hadoop@server6 hadoop]$ bin/hdfs dfs -mkdir /user[hadoop@server6 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop[hadoop@server6 hadoop]$ bin/hdfs dfs -put input test  ##上传本地的 input 并改名为 test 

刷新看看
这里写图片描述
这里写图片描述

[hadoop@server6 hadoop]$ rm -rf input/ output/[hadoop@server6 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount test output[hadoop@server6 hadoop]$ ls        ##output不在本地bin  include  libexec      logs        README.txt  shareetc  lib      LICENSE.txt  NOTICE.txt  sbin

刷新看看
这里写图片描述

这里写图片描述

那怎么查看呢?用下面的命令

[hadoop@server6 hadoop]$ bin/hdfs dfs -cat output/*...within  1without 1work    1writing,    8you 9[hadoop@server6 hadoop]$ bin/hdfs dfs -get output .        ##将output下载到本地[hadoop@server6 hadoop]$ lsbin  include  libexec      logs        output      sbinetc  lib      LICENSE.txt  NOTICE.txt  README.txt  share[hadoop@server6 hadoop]$ bin/hdfs dfs -rm -r output        ##删除17/10/24 21:11:24 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.Deleted output

这里写图片描述

3.hadoop 完全分布模式搭建

1.准备工作

用nfs网络文件系统,就不用每个节点安装一遍了,需要rpcbind和nfs开启

[hadoop@server6 hadoop]$ sbin/stop-dfs.sh [hadoop@server6 hadoop]$ logout[root@server6 ~]# yum install -y rpcbind[root@server6 ~]# /etc/init.d/rpcbind statusrpcbind is stopped[root@server6 ~]# /etc/init.d/rpcbind startStarting rpcbind:                                          [  OK  ][root@server6 ~]# /etc/init.d/rpcbind statusrpcbind (pid  2874) is running...[root@server6 ~]# yum install -y nfs-utils[root@server6 ~]# vim /etc/exports/home/hadoop * (rw,anonuid=1000,anongid=1000)[root@server6 ~]# /etc/init.d/nfs statusrpc.svcgssd is stoppedrpc.mountd is stoppednfsd is stopped[root@server6 ~]# /etc/init.d/nfs startStarting NFS services:                                     [  OK  ]Starting NFS mountd:                                       [  OK  ]Starting NFS daemon:                                       [  OK  ]Starting RPC idmapd:                                       [  OK  ][root@server6 ~]# showmount -eExport list for server6:/home/hadoop *[root@server6 ~]# exportfs -v/home/hadoop    <world>(rw,wdelay,root_squash,no_subtree_check,anonuid=1000,anongid=1000)

2.Hadoop 配置

[hadoop@server6 hadoop]$ vim etc/hadoop/core-site.xml<configuration>    <property>            <name>fs.defaultFS</name>                    <value>hdfs://masters</value>                        </property><property><name>ha.zookeeper.quorum</name><value>172.25.27.8:2181,172.25.27.9:2181,172.25.27.10:2181</value></property></configuration>[hadoop@server6 hadoop]$ vim etc/hadoop/hdfs-site.xml<configuration>    <property>                <name>dfs.replication</name>                                    <value>3</value>                                                            </property><property><name>dfs.nameservices</name><value>masters</value></property><property><name>dfs.ha.namenodes.masters</name><value>h1,h2</value></property><property><name>dfs.namenode.rpc-address.masters.h1</name><value>172.25.27.6:9000</value></property><property><name>dfs.namenode.http-address.masters.h1</name><value>172.25.27.6:50070</value></property><property><name>dfs.namenode.rpc-address.masters.h2</name><value>172.25.27.7:9000</value></property><property><name>dfs.namenode.http-address.masters.h2</name><value>172.25.27.7:50070</value></property><property><name>dfs.namenode.shared.edits.dir</name><value>qjournal://172.25.27.8:8485;172.25.27.9:8485;172.25.27.10:8485/masters</value></property><property><name>dfs.journalnode.edits.dir</name><value>/tmp/journaldata</value></property><property><name>dfs.ha.automatic-failover.enabled</name><value>true</value></property><property><name>dfs.client.failover.proxy.provider.masters</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property><property><name>dfs.ha.fencing.methods</name><value>sshfenceshell(/bin/true)</value></property><property><name>dfs.ha.fencing.ssh.private-key-files</name><value>/home/hadoop/.ssh/id_rsa</value></property><property><name>dfs.ha.fencing.ssh.connect-timeout</name><value>30000</value></property></configuration>[hadoop@server6 hadoop]$ vim etc/hadoop/slaves172.25.27.8172.25.27.9172.25.27.10[root@server6 ~]# mv zookeeper-3.4.9.tar.gz /home/hadoop/[root@server6 ~]# su - hadoop[hadoop@server6 ~]$ tar -zxf zookeeper-3.4.9.tar.gz [hadoop@server6 ~]$ cp zookeeper-3.4.9/conf/zoo_sample.cfg zookeeper-3.4.9/conf/zoo.cfg[hadoop@server6 ~]$ vim zookeeper-3.4.9/conf/zoo.cfgserver.1=172.25.27.8:2888:3888server.2=172.25.27.9:2888:3888server.3=172.25.27.10:2888:3888

3.server7\8\9\10

[root@server7 ~]# useradd -u 1000 hadoop[root@server7 ~]# id hadoopuid=1000(hadoop) gid=1000(hadoop) groups=1000(hadoop)[root@server7 ~]# yum install -y nfs-utils rpcbind[root@server7 ~]# /etc/init.d/rpcbind startStarting rpcbind:                                          [  OK  ][root@server7 ~]# mount 172.25.27.6:/home/hadoop/ /home/hadoop/[root@server8 ~]# vim hadoop.sh#!/bin/bashuseradd -u 1000 hadoopyum install -y nfs-utils rpcbind/etc/init.d/rpcbind startmount 172.25.27.6:/home/hadoop/ /home/hadoop/[root@server8 ~]# chmod +x hadoop.sh[root@server8 ~]# ./hadoop.sh[root@server8 ~]# scp hadoop.sh server9:[root@server8 ~]# scp hadoop.sh server10:[root@server9 ~]# ./hadoop.sh[root@server10 ~]# ./hadoop.sh[root@server8 ~]# su - hadoop[hadoop@server8 ~]$ mkdir /tmp/zookeeper[hadoop@server8 ~]$ echo 1 > /tmp/zookeeper/myid[hadoop@server8 ~]$ cat /tmp/zookeeper/myid1[root@server9 ~]# su - hadoop[hadoop@server9 ~]$ mkdir /tmp/zookeeper[hadoop@server9 ~]$ echo 2 > /tmp/zookeeper/myid[hadoop@server9 ~]$ cat /tmp/zookeeper/myid2[root@server10 ~]# su - hadoop[hadoop@server10 ~]$ mkdir /tmp/zookeeper[hadoop@server10 ~]$ echo 3 > /tmp/zookeeper/myid[hadoop@server10 ~]$ cat /tmp/zookeeper/myid3

4.在各节点启动服务

[hadoop@server8 ~]$ cd zookeeper-3.4.9[hadoop@server8 zookeeper-3.4.9]$ bin/zkServer.sh start[hadoop@server9 ~]$ cd zookeeper-3.4.9[hadoop@server9 zookeeper-3.4.9]$ bin/zkServer.sh start[hadoop@server10 ~]$ cd zookeeper-3.4.9[hadoop@server10 zookeeper-3.4.9]$ bin/zkServer.sh start[hadoop@server8 zookeeper-3.4.9]$ bin/zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfgMode: follower[hadoop@server9 zookeeper-3.4.9]$ bin/zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfgMode: leader[hadoop@server10 zookeeper-3.4.9]$ bin/zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: /home/hadoop/zookeeper-3.4.9/bin/../conf/zoo.cfgMode: follower

5.启动 hdfs 集群(按顺序启动)

在三个 DN 上依次启动 zookeeper 集群(刚才已经启动过了,这里查看下状态,如为启动需要启动)

[hadoop@server8 zookeeper-3.4.9]$ jps2012 QuorumPeerMain2736 Jps

在三个 DN 上依次启动 journalnode(第一次启动 hdfs 必须先启动 journalnode)

[hadoop@server8 hadoop]$ sbin/hadoop-daemon.sh start journalnodestarting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server8.out[hadoop@server8 hadoop]$ jps2818 Jps2769 JournalNode2012 QuorumPeerMain[hadoop@server9 hadoop]$ sbin/hadoop-daemon.sh start journalnodestarting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server9.out[hadoop@server9 hadoop]$ jps2991 Jps2205 QuorumPeerMain2942 JournalNode[hadoop@server10 hadoop]$ sbin/hadoop-daemon.sh start journalnodestarting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server10.out[hadoop@server10 hadoop]$ jps2328 JournalNode1621 QuorumPeerMain2377 Jps

格式化 HDFS 集群
Namenode 数据默认存放在/tmp,需要把数据拷贝到 h2

[hadoop@server6 hadoop]$ bin/hdfs namenode -format[hadoop@server6 hadoop]$ scp -r /tmp/hadoop-hadoop 172.25.27.7:/tmp

格式化 zookeeper (只需在 h1 上执行即可)

[hadoop@server6 hadoop]$ bin/hdfs zkfc -formatZK

启动 hdfs 集群(只需在 h1 上执行即可)

[hadoop@server6 hadoop]$ sbin/stop-all.sh [hadoop@server6 hadoop]$ sbin/start-dfs.shStarting namenodes on [server6 server7]server6: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-namenode-server6.outserver7: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-namenode-server7.out172.25.27.9: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-server9.out172.25.27.10: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-server10.out172.25.27.8: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-server8.outStarting journal nodes [172.25.27.8 172.25.27.9 172.25.27.10]172.25.27.10: starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server10.out172.25.27.8: starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server8.out172.25.27.9: starting journalnode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-server9.outStarting ZK Failover Controllers on NN hosts [server6 server7]server6: starting zkfc, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-zkfc-server6.outserver7: starting zkfc, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-zkfc-server7.out

这里写图片描述
查看各节点状态

[hadoop@server6 hadoop]$ jps3783 NameNode4162 Jps4092 DFSZKFailoverController[hadoop@server7 hadoop]$ jps2970 Jps2817 NameNode2921 DFSZKFailoverController[hadoop@server8 tmp]$ jps2269 DataNode1161 QuorumPeerMain2366 JournalNode2426 Jps[hadoop@server9 hadoop]$ jps1565 QuorumPeerMain2625 DataNode2723 JournalNode2788 Jps[hadoop@server10 hadoop]$ jps2133 DataNode1175 QuorumPeerMain2294 Jps2231 JournalNode

注意:如果发现某个节点的DataNode 没有启动,清尝试先停掉hdfs 集群,然后再删除该节点的 /tmp/hadoop-hadoop 文件夹,再重新启动hdfs 集群就没问题了

这里写图片描述

这里写图片描述

测试故障自动切换

[hadoop@server6 hadoop]$ jps3783 NameNode4162 Jps4092 DFSZKFailoverController[hadoop@server6 hadoop]$ kill -9 3783[hadoop@server6 hadoop]$ jps4092 DFSZKFailoverController4200 Jps[hadoop@server7 hadoop]$ jps2817 NameNode2921 DFSZKFailoverController3030 Jps

杀掉 h1 主机的 namenode 进程后依然可以访问,此时 h2 转为 active 状态接管 namenode
这里写图片描述

[hadoop@server6 hadoop]$ sbin/hadoop-daemon.sh start namenode

启动 h1 上的 namenode,此时 h1 为 standby 状态。

这里写图片描述

到此 hdfs 的高可用完成

阅读全文
0 0
原创粉丝点击