Hadoop和Hbase动态扩展

来源：互联网发布：人工智能的事例编辑：程序博客网时间：2024/06/10 20:37

环境：
Centos7.2 64位
hadoop-2.6.0-cdh5.5.2
hbase-1.0.0-cdh5.5.2

jdk1.8.0_91

master：192.168.205.153
slave1：192.168.205.154
slave2：192.168.205.155
新增节点slave3：192.168.205.156

一、hadoop添加节点
● 添加节点有两种方式，一种是静态添加，关闭hadoop集群，配置相应配置，重启集群（这个就不再重述了）
● 动态添加，在不重启集群的情况下添加节点

1.准备工作：
(1).修改slave3的主机名
[root@localhost ~]# vi /etc/hosts

192.168.205.153 h153192.168.205.154 h154192.168.205.155 h155192.168.205.156 h156

[root@localhost ~]# vi /etc/hostname
h156

[root@localhost ~]# vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=h156

[root@localhost ~]# reboot

(2).在slave3中创建hadoop用户名，安装jdk1.8.0_91（具体步骤请参考我的另一篇文章http://blog.csdn.net/m0_37739193/article/details/71222673），并且把slave2的hadoop-2.6.0-cdh5.5.2复制到slave3的相应目录下
[hadoop@h155 ~]$ scp -r hadoop-2.6.0-cdh5.5.2/ h156:/home/hadoop/
注意：一开始我按上面的步骤把slave2上的hadoop-2.6.0-cdh5.5.2复制到slave3上，但到最后在浏览器上查看DataNode information发现slave2和slave3不能共存，三个节点只显示有两个，h154一直都在，而h155和h156竞争，有我则没你，也是奇了怪了。。
解决：应该是把主节点master的hadoop-2.6.0-cdh5.5.2目录复制到slave3上

(3).配置namenode节点和resourcemanager节点到slave03的免登录
[hadoop@h153 ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h156

(4).在master、slave1和slave2的/etc/hosts文件中添加如下一行使这四台机器的/etc/hosts一致（因为是动态扩展，所以不需要重启）
192.168.205.156 h156

2.动态新增节点
(1).修改master、slave1、slave2、slave3的etc/hadoop/slaves文件，添加新增的节点slave3
[hadoop@h153 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/slaves
h154
h155
h156

(2).在新增的slave3节点执行命令./sbin/hadoop-daemon.sh start datanode启动datanode：
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ ./sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /home/hadoop/hadoop-2.6.0-cdh5.5.2/logs/hadoop-hadoop-datanode-h156.out
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ jps
2898 Jps
2854 DataNode

(3).在新增的slave3节点执行命令./sbin/yarn-daemon.sh start nodemanager启动nodemanager：
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ ./sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /home/hadoop/hadoop-2.6.0-cdh5.5.2/logs/yarn-hadoop-nodemanager-h156.out
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ jps
2854 DataNode
3015 Jps
2952 NodeManager

(4)新增的节点slaves启动了DataNode和NodeManager，实现了动态向集群添加了节点，在浏览器上登录h153:50070查看DataNode information

登录h153:8088查看Nodes of the cluster

二、动态删除节点
1.master配置启用动态删除节点，在etc/hadoop/目录下添加excludes文件，配置需要输出的节点：
[hadoop@h153 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/excludes
h156

2.master修改etc/hadoop/hdfs-site.xml添加如下内容：
[hadoop@h153 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/hdfs-site.xml

<property>        <name>dfs.hosts.exclude</name>        <value>/home/hadoop/hadoop-2.6.0-cdh5.5.2/etc/hadoop/excludes</value></property>

3.master修改mapred-site.xml添加如下内容：
[hadoop@h153 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/mapred-site.xml

<property>        <name>mapred.hosts.exclude</name>        <value>/home/hadoop/hadoop-2.6.0-cdh5.5.2/etc/hadoop/excludes</value>        <final>true</final></property>

4.master修改完这些配置文件后，执行命令命令./bin/hadoop dfsadmin -refreshNodes：
[hadoop@h153 hadoop-2.6.0-cdh5.5.2]$ ./bin/hadoop dfsadmin -refreshNodes

DEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.17/09/16 02:26:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableRefresh nodes successful

5.可用./bin/hadoop dfsadmin -report或web界面查看slave3节点状态变由Normal->decomissioning->Decommissioned。
[hadoop@h153 hadoop-2.6.0-cdh5.5.2]$ ./bin/hadoop dfsadmin -report

DEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.17/09/16 02:29:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableConfigured Capacity: 37492891648 (34.92 GB)Present Capacity: 31898951680 (29.71 GB)DFS Remaining: 31898517504 (29.71 GB)DFS Used: 434176 (424 KB)DFS Used%: 0.00%Under replicated blocks: 0Blocks with corrupt replicas: 0Missing blocks: 0Missing blocks (with replication factor 1): 0-------------------------------------------------Live datanodes (3):Name: 192.168.205.154:50010 (h154)Hostname: h154Decommission Status : NormalConfigured Capacity: 18746441728 (17.46 GB)DFS Used: 212992 (208 KB)Non DFS Used: 2797916160 (2.61 GB)DFS Remaining: 15948312576 (14.85 GB)DFS Used%: 0.00%DFS Remaining%: 85.07%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 7Last contact: Sat Sep 16 02:29:14 CST 2017Name: 192.168.205.155:50010 (h155)Hostname: h155Decommission Status : NormalConfigured Capacity: 18746441728 (17.46 GB)DFS Used: 212992 (208 KB)Non DFS Used: 2796023808 (2.60 GB)DFS Remaining: 15950204928 (14.85 GB)DFS Used%: 0.00%DFS Remaining%: 85.08%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 7Last contact: Sat Sep 16 02:29:13 CST 2017Name: 192.168.205.156:50010 (h156)Hostname: h156Decommission Status : DecommissionedConfigured Capacity: 18746441728 (17.46 GB)DFS Used: 8192 (8 KB)Non DFS Used: 2058260480 (1.92 GB)DFS Remaining: 16688173056 (15.54 GB)DFS Used%: 0.00%DFS Remaining%: 89.02%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 1Last contact: Sat Sep 16 02:29:14 CST 2017

6.在slave3节点上关闭datanode和nodemanager进程，运行./sbin/hadoop-daemon.sh stop datanode和./sbin/yarn-daemon.sh stop nodemanager：
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ ./sbin/hadoop-daemon.sh stop datanode
stopping datanode
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ ./sbin/yarn-daemon.sh stop nodemanager
stopping nodemanager
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ jps
10148 Jps

7../bin/hadoop dfsadmin -report或web界面查看节点状态：（当第6步执行完后并没有马上出现下面的效果，而是等了很长时间）
[hadoop@h153 hadoop-2.6.0-cdh5.5.2]$ ./bin/hadoop dfsadmin -report

DEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.17/09/16 02:52:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableConfigured Capacity: 37492883456 (34.92 GB)Present Capacity: 31898931200 (29.71 GB)DFS Remaining: 31898456064 (29.71 GB)DFS Used: 475136 (464 KB)DFS Used%: 0.00%Under replicated blocks: 17Blocks with corrupt replicas: 0Missing blocks: 0Missing blocks (with replication factor 1): 0-------------------------------------------------Live datanodes (2):Name: 192.168.205.154:50010 (h154)Hostname: h154Decommission Status : NormalConfigured Capacity: 18746441728 (17.46 GB)DFS Used: 237568 (232 KB)Non DFS Used: 2797936640 (2.61 GB)DFS Remaining: 15948267520 (14.85 GB)DFS Used%: 0.00%DFS Remaining%: 85.07%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 7Last contact: Sat Sep 16 02:52:32 CST 2017Name: 192.168.205.155:50010 (h155)Hostname: h155Decommission Status : NormalConfigured Capacity: 18746441728 (17.46 GB)DFS Used: 237568 (232 KB)Non DFS Used: 2796015616 (2.60 GB)DFS Remaining: 15950188544 (14.85 GB)DFS Used%: 0.00%DFS Remaining%: 85.08%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 7Last contact: Sat Sep 16 02:52:34 CST 2017Dead datanodes (1):Name: 192.168.205.156:50010 (h156)Hostname: h156Decommission Status : DecommissionedConfigured Capacity: 0 (0 B)DFS Used: 0 (0 B)Non DFS Used: 0 (0 B)DFS Remaining: 0 (0 B)DFS Used%: 100.00%DFS Remaining%: 0.00%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 0Last contact: Sat Sep 16 02:30:56 CST 2017

8.默认balancer的threshold为10%，即各个节点与集群总的存储使用率相差不超过10%，我们可将其设置为5%，然后启动Balancer，sbin/start-balancer.sh -threshold 5，等待集群自均衡完成即可
[hadoop@h153 hadoop-2.6.0-cdh5.5.2]$ sbin/start-balancer.sh -threshold 5
starting balancer, logging to /home/hadoop/hadoop-2.6.0-cdh5.5.2/logs/hadoop-hadoop-balancer-h153.out

三、添加HBase节点（192.168.205.156）
1.准备工作：（和hadoop增加节点的基本一样）

2.把主节点的hbase-1.0.0-cdh5.5.2复制到h156上
[hadoop@h153 ~]$ scp -r hbase-1.0.0-cdh5.5.2/ h156:/home/hadoop/

3.修改所有节点hbase安装目录下conf中的regionservers里的内容，新增h156
[hadoop@h153 ~]$ vi hbase-1.0.0-cdh5.5.2/conf/regionservers
h154
h155
h156

4.在h156上启动regionserver
[hadoop@h156 hbase-1.0.0-cdh5.5.2]$ bin/hbase-daemon.sh start regionserver
[hadoop@h156 hbase-1.0.0-cdh5.5.2]$ jps
2804 HRegionServer
2939 Jps

5.在新启动的节点h156上打开hbase shell，如下设置：
balance_switch true

四、hbase删除HRegionServer（192.168.205.155）
1.方法一
(1).节点关闭之前要先禁用负载均衡，在要关闭的节点上启动hbase shell

hbase(main):003:0> balance_switch status2 servers, 0 dead, 1.0000 average loadtrue                                                                                                                                                                                                                                         0 row(s) in 0.0660 seconds（我刚重启hbase集群后查看是true，可是再查看马上就又成false了）hbase(main):004:0> balance_switch false（改为false）true（原来是true）                                                                 0 row(s) in 0.0700 seconds

(2).在需要删除的RegionServer上执行以下命令./bin/hbase-daemon.sh stop regionserver
[hadoop@h155 ~]$ jps
2707 NodeManager
2915 QuorumPeerMain
11012 Jps
2601 DataNode
3082 HRegionServer
[hadoop@h155 hbase-1.0.0-cdh5.5.2]$ ./bin/hbase-daemon.sh stop regionserver
stopping regionserver....
● region服务器会先关闭所有region，然后把自己停止。（这句话不是很明白）
● 等待zk超时后就会过期。
● master会将这台机器上的region移动到别的机器上
[hadoop@h155 hbase-1.0.0-cdh5.5.2]$ jps
11074 Jps
2707 NodeManager
2915 QuorumPeerMain
2601 DataNode
注意：此节点将会在Zookeeper消失。Master注意到了此RegionServer挂掉了，它将会重新分配掉的这些Region。在停掉一个节点的时候，注意要先关闭Load Balancer即第1步（若已经为false就不用管了），因为Load Balancer可能要和Master的恢复机制争夺停掉的RegionServer

2.方法二
(1).方法一中region下线会带来一定的服务不可用时间，时间取决于zk的超时。这种方式不是很好。所以最好使用graceful_stop
[hadoop@h155 ~]$ jps
2707 NodeManager
2915 QuorumPeerMain
11012 Jps
2601 DataNode
3082 HRegionServer
[hadoop@h155 hbase-1.0.0-cdh5.5.2]$ ./bin/graceful_stop.sh h155

2017-09-16T03:53:08 Disabling load balancer2017-09-16 03:53:13,443 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available2017-09-16 03:53:15,642 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2017-09-16T03:53:18 Previous balancer state was false2017-09-16T03:53:18 Unloading h155 region(s)2017-09-16 03:53:23,336 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2017-09-16 03:53:25,345 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x3679d92e connecting to ZooKeeper ensemble=h154:2181,h153:2181,h155:21812017-09-16 03:53:25,362 INFO  [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-cdh5.5.2--1, built on 01/25/2016 17:46 GMT2017-09-16 03:53:25,362 INFO  [main] zookeeper.ZooKeeper: Client environment:host.name=h1542017-09-16 03:53:25,362 INFO  [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_912017-09-16 03:53:25,362 INFO  [main] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation2017-09-16 03:53:25,362 INFO  [main] zookeeper.ZooKeeper: Client environment:java.home=/usr/jdk1.8.0_91/jre2017-09-16 03:53:25,362 INFO  [main] zookeeper.ZooKeeper: Client environment:java.class.path=（jar包太多我省略了。。。）2017-09-16 03:53:25,402 INFO  [main] zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/hbase-1.0.0-cdh5.5.2/bin/../lib/native/Linux-amd64-642017-09-16 03:53:25,402 INFO  [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp2017-09-16 03:53:25,403 INFO  [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>2017-09-16 03:53:25,403 INFO  [main] zookeeper.ZooKeeper: Client environment:os.name=Linux2017-09-16 03:53:25,403 INFO  [main] zookeeper.ZooKeeper: Client environment:os.arch=amd642017-09-16 03:53:25,403 INFO  [main] zookeeper.ZooKeeper: Client environment:os.version=3.10.0-327.el7.x86_642017-09-16 03:53:25,408 INFO  [main] zookeeper.ZooKeeper: Client environment:user.name=hadoop2017-09-16 03:53:25,408 INFO  [main] zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop2017-09-16 03:53:25,408 INFO  [main] zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop/hbase-1.0.0-cdh5.5.22017-09-16 03:53:25,409 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=h154:2181,h153:2181,h155:2181 sessionTimeout=90000 watcher=hconnection-0x3679d92e0x0, quorum=h154:2181,h153:2181,h155:2181, baseZNode=/hbase2017-09-16 03:53:25,494 INFO  [main-SendThread(h153:2181)] zookeeper.ClientCnxn: Opening socket connection to server h153/192.168.205.153:2181. Will not attempt to authenticate using SASL (unknown error)2017-09-16 03:53:25,503 INFO  [main-SendThread(h153:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.205.154:40286, server: h153/192.168.205.153:21812017-09-16 03:53:25,521 INFO  [main-SendThread(h153:2181)] zookeeper.ClientCnxn: Session establishment complete on server h153/192.168.205.153:2181, sessionid = 0x15e867a6d3e0003, negotiated timeout = 40000RuntimeError: Server h155:60020 not online    stripServer at /home/hadoop/hbase-1.0.0-cdh5.5.2/bin/region_mover.rb:192  unloadRegions at /home/hadoop/hbase-1.0.0-cdh5.5.2/bin/region_mover.rb:301         (root) at /home/hadoop/hbase-1.0.0-cdh5.5.2/bin/region_mover.rb:4842017-09-16T03:53:26 Unloaded h155 region(s)2017-09-16T03:53:26 Stopping regionserverstopping regionserver..

[hadoop@h155 hbase-1.0.0-cdh5.5.2]$ jps
11074 Jps
2707 NodeManager
2915 QuorumPeerMain
2601 DataNode

(2).由于会关闭hbase的balancer，因此需要在其他节点上打开hbase shell，检查hbase状态，同时重新设置：
hbase(main):001:0> balance_switch true
false

疑问：当我把这个设为true后，过一会儿马上就又变成false，既然它马上自动变为false，那又何必去管它呢

参考：
http://blog.csdn.net/Mark_LQ/article/details/53393081
https://www.cnblogs.com/zlingh/p/3983984.html

阅读全文

0 0