树莓派hadoop集群搭建
来源:互联网 发布:怎么查看自己淘宝店铺 编辑:程序博客网 时间:2024/06/04 18:41
软件版本:
hadoop-2.6.4;hbase-0.98.20-hadoop2;zookeeper-3.4.6
使用的源:
deb http://mirrors.ustc.edu.cn/raspbian/raspbian/ jessie main contrib non-free rpideb-src http://mirrors.ustc.edu.cn/raspbian/raspbian/ jessie main contrib non-free rpi
结构:
主机名 IP 安装的软件 运行的进程nna 192.168.11.81 jdk、hadoop NameNode、DFSZKFailoverController(zkfc)nns 192.168.11.82 jdk、hadoop NameNode、DFSZKFailoverController(zkfc)rma 192.168.11.83 jdk、hadoop ResourceManagerrms 192.168.11.84 jdk、hadoop ResourceManagerhba 192.168.11.85 jdk、hadoop、hbase HMasterhbs 192.168.11.86 jdk、hadoop、hbase HMasterdn1 192.168.11.91 jdk、hadoop、zookeeper、hbase DataNode、NodeManager、JournalNode、QuorumPeerMain、HRegionServerdn2 192.168.11.92 jdk、hadoop、zookeeper、hbase DataNode、NodeManager、JournalNode、QuorumPeerMain、HRegionServerdn3 192.168.11.93 jdk、hadoop、zookeeper、hbase DataNode、NodeManager、JournalNode、QuorumPeerMain、HRegionServer
1.创建hadoop用户(root下操作)
adduser hadoopchmod +w /etc/sudoers hadoop ALL=(root)NOPASSWD:ALL chmod -w /etc/sudoers
2.同步时间
sudo cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
3.U盘开机自动挂载
U盘格式为fat32 == vfat
uid为用户ID,gid为用户组ID,id命令查看
修改/etc/fstab,在末尾添加
/dev/sda1 /hadoop vfat suid,exec,dev,noatime,user,utf8,rw,auto,async,uid=1001,gid=1001 0 0
4.配置hosts
修改/etc/hosts
192.168.11.81 nna192.168.11.82 nns192.168.11.83 mra192.168.11.84 mrs192.168.11.91 dn1192.168.11.92 dn2192.168.11.93 dn3
修改/etc/hotname
nna
5.安装jdk
安装openjdk或orcaljdk
sudo apt-cache search jdksudo apt-get install openjdk-8-jdksudo apt-get install oracle-java8-jdk
6.配置环境变量
修改/etc/profile
# set java environmentexport JAVA_HOME=/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/export JRE_HOME=/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/jreexport CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/libexport PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin# set hadoop environmentexport HADOOP_HOME=/home/hadoop/hadoop-2.6.4export PATH=$PATH:$HADOOP_HOME/bin# set zookeeper environmentexport ZK_HOME=/home/hadoop/zookeeper-3.4.6export PATH=$PATH:$ZK_HOME/bin# set hbase environmentexport HBASE_HOME=/home/hadoop/hbase-0.98.20-hadoop2export PATH=$PATH:$HBASE_HOME/bin
7.创建目录
mkdir -p /hadoop/tmpmkdir -p /hadoop/data/tmp/journalmkdir -p /hadoop/data/dfs/namemkdir -p /hadoop/data/dfs/datamkdir -p /hadoop/data/yarn/localmkdir -p /hadoop/data/zookeepermkdir -p /hadoop/log/yarn
8.安装zookeeper
修改 ~/zookeeper-3.4.6/conf/zoo.cfg
# The number of milliseconds of each tick# 服务器与客户端之间交互的基本时间单元(ms)tickTime=2000# The number of ticks that the initial# synchronization phase can take# zookeeper所能接受的客户端数量initLimit=10# The number of ticks that can pass between# sending a request and getting an acknowledgement# 服务器和客户端之间请求和应答之间的时间间隔syncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just# example sakes.# 保存zookeeper数据,日志的路径dataDir=/hadoop/data/zookeeper# the port at which the clients will connect# 客户端与zookeeper相互交互的端口clientPort=2181server.1=dn1:2888:3888server.2=dn2:2888:3888server.3=dn3:2888:3888# server.A=B:C:D# 其中A是一个数字,代表这是第几号服务器;B是服务器的IP地址;# C表示服务器与群集中的“领导者”交换信息的端口;当领导者失效后,D表示用来执行选举时服务器相互通信的端口。# the maximum number of client connections.# increase this if you need to handle more clients#maxClientCnxns=60## Be sure to read the maintenance section of the# administrator guide before turning on autopurge.## http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDir#autopurge.snapRetainCount=3# Purge task interval in hours# Set to "0" to disable auto purge feature#autopurge.purgeInterval=1
接下来,dn节点下的的dataDir目录下创建一个myid文件,里面写入一个0-255之间的一个随意数字,
文件中序号要与dn节点下的zk配置序号一直,
如:server.1=dn1:2888:3888,那么dn1节点下的myid配置文件应该写上1
9.安装hadoop
修改/etc/hadoop/slaves
dn1dn2dn3
修改/etc/hadoop/hadoop-env.sh
# The java implementation to use.export JAVA_HOME=/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/
修改/etc/hadoop/yarn-env.sh
# some Java parametersexport JAVA_HOME=/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/
修改/etc/hadoop/core-site.xml
<configuration><!-- 指定hdfs的nameservice为cluster --> <property> <name>fs.defaultFS</name> <value>hdfs://cluster</value> </property> <property> <name>io.file.buffer.size</name> <value>65535</value> </property> <!-- 指定hadoop临时目录 --> <property> <name>hadoop.tmp.dir</name> <value>/hadoop/tmp</value> </property> <property> <name>hadoop.proxyuser.hduser.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hduser.groups</name> <value>*</value> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>dn1:2181,dn2:2181,dn3:2181</value> </property></configuration>
修改/etc/hadoop/hdfs-site.xml
<configuration><!--指定hdfs的nameservice为cluster,需要和core-site.xml中的保持一致 --> <property> <name>dfs.nameservices</name> <value>cluster</value> </property> <!-- cluster下面有两个NameNode,分别是nna,nns --> <property> <name>dfs.ha.namenodes.cluster</name> <value>nna,nns</value> </property> <!-- nna的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.cluster.nna</name> <value>nna:9000</value> </property> <!-- nns的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.cluster.nns</name> <value>nns:9000</value> </property> <!-- nna的http通信地址 --> <property> <name>dfs.namenode.http-address.cluster.nna</name> <value>nna:50070</value> </property> <!-- nns的http通信地址 --> <property> <name>dfs.namenode.http-address.cluster.nns</name> <value>nns:50070</value> </property> <!-- 指定NameNode的元数据在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://dn1:8485;dn2:8485;dn3:8485/cluster</value> </property><!-- 指定JournalNode在本地磁盘存放数据的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/hadoop/data/tmp/journal</value> </property> <!-- 开启NameNode失败自动切换 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 配置失败自动切换实现方式 --> <property> <name>dfs.client.failover.proxy.provider.cluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行--> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <!-- 使用sshfence隔离机制时需要ssh免登陆 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <!-- 配置sshfence隔离机制超时时间 --> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/hadoop/data/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/hadoop/data/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.journalnode.http-address</name> <value>0.0.0.0:8480</value> </property> <property> <name>dfs.journalnode.rpc-address</name> <value>0.0.0.0:8485</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>dn1:2181,dn2:2181,dn3:2181</value> </property></configuration>
修改/etc/hadoop/mapred-site.xml
<configuration><!-- 指定mr框架为yarn方式 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>nna:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>nna:19888</value> </property></configuration>
修改/etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.connect.retry-interval.ms</name> <value>2000</value> </property> <!-- 开启RM高可靠 --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 指定RM的名字 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>dn1:2181,dn2:2181,dn3:2181</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 指定RM1的地址 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>nna</value> </property> <!-- 指定RM2的地址 --> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>nns</value> </property> <!--在namenode1上配置rm1,在namenode2上配置rm2,注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在YARN的另一个机器上一定要修改 --> <property> <name>yarn.resourcemanager.ha.id</name> <value>rm1</value> </property> <!--开启自动恢复功能 --> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!--配置与zookeeper的连接地址 --> <property> <name>yarn.resourcemanager.zk-state-store.address</name> <value>dn1:2181,dn2:2181,dn3:2181</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <!-- 指定zk集群地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>dn1:2181,dn2:2181,dn3:2181</value> </property> <!-- 指定RM的cluster id --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>cluster1-yarn</value> </property> <!--schelduler失联等待连接时间 --> <property> <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name> <value>5000</value> </property> <!--配置rm1 --> <property> <name>yarn.resourcemanager.address.rm1</name> <value>nna:8132</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>nna:8130</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>nna:8188</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm1</name> <value>nna:8131</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm1</name> <value>nna:8033</value> </property> <property> <name>yarn.resourcemanager.ha.admin.address.rm1</name> <value>nna:23142</value> </property> <!--配置rm2 --> <property> <name>yarn.resourcemanager.address.rm2</name> <value>nns:8132</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>nns:8130</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>nns:8188</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>nns:8131</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm2</name> <value>nns:8033</value> </property> <property> <name>yarn.resourcemanager.ha.admin.address.rm2</name> <value>nns:23142</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/home/hadoop/data/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/home/hadoop/log/yarn</value> </property> <property> <name>mapreduce.shuffle.port</name> <value>23080</value> </property> <!--故障处理类 --> <property> <name>yarn.client.failover-proxy-provider</name> <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name> <value>/yarn-leader-election</value> </property></configuration>
10.安装hbase
覆盖hbase中lib文件夹下的 zookeeper*.jar 文件
rm -rf hbase-0.98.20-hadoop2/lib/zookeeper*.jarfind zookeeper-3.4.6/ -name "zookeeper*.jar" | xargs -i cp {} hbase-0.98.20-hadoop2/lib/
覆盖hbase中lib文件夹下的 hadoop*.jar 文件
rm -rf hbase-0.98.20-hadoop2/lib/hadoop*.jarfind hadoop-2.6.4/share/hadoop -name "hadoop*.jar" | xargs -i cp {} hbase-0.98.20-hadoop2/lib/
修改conf/hbase-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/export HBASE_MANAGES_ZK=flase //HBase是否管理它自己的ZooKeeper的实例。
修改conf/regionservers
dn1dn2dn3
修改conf/hbase-site.xml
$HBASE_HOME/conf/hbase-site.xml的hbase.rootdir的主机和端口号与$HADOOP_HOME/conf/core-site.xml的fs.default.name的主机和端口号一致
<configuration><property><name>hbase.rootdir</name><value>hdfs://nna:9000/hbase</value></property><property><name>hbase.cluster.distributed</name><value>true</value><description>The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) </description></property><property><name>hbase.master</name><value>nna:60000</value></property><property><name>hbase.master.port</name><value>60000</value><description>The port master should bind to.</description></property> <property><name>hbase.zookeeper.quorum</name><value>dn1:2181,dn2:2181,dn3:2181</value></property><property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/hadoop/data/zookeeper</value> <description>Property from ZooKeeper config zoo.cfg. The directory where the snapshot is stored. </description> </property></configuration>
11.备份镜像,并刻录至各个节点
修改dn节点下的的dataDir目录下的myid文件
12.配置免密码登陆
ssh-keygen -t rsassh-copy-id -i Masterssh-copy-id -i ~/.ssh/id_rsa.pub nnassh-copy-id -i ~/.ssh/id_rsa.pub nnsssh-copy-id -i ~/.ssh/id_rsa.pub dn1ssh-copy-id -i ~/.ssh/id_rsa.pub dn2ssh-copy-id -i ~/.ssh/id_rsa.pub dn3
13.初始化并启动各个模块
//------------------------------------------------------------------------
方案一
启动zookeeper
在 dn1、dn2、dn3上启动
#./zookeeper-3.4.6/bin/zkServer.sh start#./zookeeper-3.4.6/bin/zkServer.sh restart
在 dn1、dn2、dn3上查看状态:一个leader,两个follower
#./zookeeper-3.4.6/bin/zkServer.sh status
在 dn1、dn2、dn3上启动
#./hadoop-2.6.4/sbin/hadoop-daemon.sh start journalnode
在 nna 上格式化hdfs
hadoop namenode –format
格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件
拷贝至nns、dn1、dn2、dn3
scp -r /hadoop/data/dfs/name/current hadoop@nns:/hadoop/data/dfs/name/currentscp -r /hadoop/data/dfs/name/current hadoop@dn1:/hadoop/data/dfs/name/currentscp -r /hadoop/data/dfs/name/current hadoop@dn2:/hadoop/data/dfs/name/currentscp -r /hadoop/data/dfs/name/current hadoop@dn3:/hadoop/data/dfs/name/current
在 nna、nns上格式化ZK
#hdfs zkfc -formatZK
在 nna 上启动HDFS
#./hadoop-2.6.4/sbin/start-dfs.sh
启动rma的YARN
#./hadoop-2.6.4/sbin/start-yarn.sh
启动rms的YARN
#./hadoop-2.6.4/sbin/yarn-daemon.sh start resourcemanager
启动hbase
在hba上启动hbase
start-hbase.sh
在hbs上启动hbase
hbase-daemon.sh start master
//------------------------------------------------------------------------
方案二
启动zookeeper
在 dn1、dn2、dn3上启动
#./zookeeper-3.4.6/bin/zkServer.sh start#./zookeeper-3.4.6/bin/zkServer.sh restart
在 dn1、dn2、dn3上查看状态:一个leader,两个follower
#./zookeeper-3.4.6/bin/zkServer.sh status
在 dn1、dn2、dn3上启动
#./hadoop-2.6.4/sbin/hadoop-daemon.sh start journalnode
格式化nna的NameNode
hdfs namenode –format
启动nna的NameNode
#./hadoop-2.6.4/sbin/hadoop-daemon.sh start namenode
格式化nns的NameNode
hdfs namenode -bootstrapStandby
启动nns的NameNode
#./hadoop-2.6.4/sbin/hadoop-daemon.sh start namenode
在nna转换active
hdfs haadmin -transitionToActive nna
在nna启动DataNodes
#./hadoop-2.6.4/sbin/hadoop-daemons.sh start datanode
切换nna、nns角色
hdfs haadmin –failover –forceactive nna nns
启动rma的YARN
#./hadoop-2.6.4/sbin/start-yarn.sh
启动rms的YARN
#./hadoop-2.6.4/sbin/yarn-daemon.sh start resourcemanager
启动hbase
在hba上启动hbase
start-hbase.sh
在hbs上启动hbase
hbase-daemon.sh start master
//------------------------------------------------------------------------
14.关闭集群
在hbs上关闭hbase
hbase-daemon.sh stop master
在hba上关闭hbase
stop-hbase.sh
关闭rms的YARN
#./hadoop-2.6.4/sbin/yarn-daemon.sh stop resourcemanager
关闭rma的YARN
#./hadoop-2.6.4/sbin/stop-yarn.sh
在 nna 上关闭HDFS
#./hadoop-2.6.4/sbin/stop-dfs.sh
在 dn1、dn2、dn3上关闭zookeeper
#./zookeeper-3.4.6/bin/zkServer.sh stop
15.再次启动
在 dn1、dn2、dn3上启动zookeeper
#./zookeeper-3.4.6/bin/zkServer.sh start
在 dn1、dn2、dn3上查看状态:一个leader,两个follower
#./zookeeper-3.4.6/bin/zkServer.sh status
在 dn1、dn2、dn3上启动
#./hadoop-2.6.4/sbin/hadoop-daemon.sh start journalnode
在 nna 上启动HDFS
#./hadoop-2.6.4/sbin/start-dfs.sh
启动rma的YARN
#./hadoop-2.6.4/sbin/start-yarn.sh
启动rms的YARN
#./hadoop-2.6.4/sbin/yarn-daemon.sh start resourcemanager
在hba上启动hbase
start-hbase.sh
在hbs上启动hbase
hbase-daemon.sh start master
16.验证
http://nna:50070
http://nns:50070
http://192.168.11.81:8188
http://192.168.11.81:8188
http://hba:60010
http://hbs:60010
19888
17.增加节点
0 0
- 树莓派hadoop集群搭建
- hadoop集群搭建(hadoop)
- HADOOP: 搭建hadoop集群
- 树莓派Raspberry Pi搭建Hadoop集群
- 基于树莓派的Hadoop集群搭建
- hadoop集群搭建
- Hadoop集群的搭建
- Hadoop集群搭建
- Hadoop集群搭建
- 搭建hadoop集群
- Nutch+Hadoop集群搭建
- Hadoop集群搭建
- Hadoop集群搭建
- Hadoop环境搭建-集群
- 搭建hadoop集群
- 搭建hadoop集群
- Hadoop集群搭建
- Nutch+Hadoop集群搭建
- Java_GC详解
- 第一章 MySQL体系
- Codeforces Round #359 (Div. 2) Easy
- 证书问题
- 第三方登录
- 树莓派hadoop集群搭建
- python 文件和文件夹操作 os模块和shutil模块
- Activity的Launch Mode
- Phalcon环境搭建
- 获取网络同步时间
- NSTimer
- Android 47个小知识
- 阅读BeautifulSoup笔记
- ArrayList集合的一个测试题(带答案)