Hadoop和Hbase动态扩展

来源:互联网 发布:人工智能的事例 编辑:程序博客网 时间:2024/06/10 20:37
环境:
Centos7.2 64位
hadoop-2.6.0-cdh5.5.2
hbase-1.0.0-cdh5.5.2

jdk1.8.0_91


master:192.168.205.153
slave1:192.168.205.154
slave2:192.168.205.155
新增节点slave3:192.168.205.156


一、hadoop添加节点
● 添加节点有两种方式,一种是静态添加,关闭hadoop集群,配置相应配置,重启集群(这个就不再重述了)
● 动态添加,在不重启集群的情况下添加节点


1.准备工作:
(1).修改slave3的主机名
[root@localhost ~]# vi /etc/hosts

192.168.205.153 h153192.168.205.154 h154192.168.205.155 h155192.168.205.156 h156
[root@localhost ~]# vi /etc/hostname
h156

[root@localhost ~]# vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=h156

[root@localhost ~]# reboot

(2).在slave3中创建hadoop用户名,安装jdk1.8.0_91(具体步骤请参考我的另一篇文章http://blog.csdn.net/m0_37739193/article/details/71222673),并且把slave2的hadoop-2.6.0-cdh5.5.2复制到slave3的相应目录下
[hadoop@h155 ~]$ scp -r hadoop-2.6.0-cdh5.5.2/ h156:/home/hadoop/
注意:一开始我按上面的步骤把slave2上的hadoop-2.6.0-cdh5.5.2复制到slave3上,但到最后在浏览器上查看DataNode information发现slave2和slave3不能共存,三个节点只显示有两个,h154一直都在,而h155和h156竞争,有我则没你,也是奇了怪了。。
解决:应该是把主节点master的hadoop-2.6.0-cdh5.5.2目录复制到slave3上


(3).配置namenode节点和resourcemanager节点到slave03的免登录
[hadoop@h153 ~]$ ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub h156


(4).在master、slave1和slave2的/etc/hosts文件中添加如下一行使这四台机器的/etc/hosts一致(因为是动态扩展,所以不需要重启)
192.168.205.156 h156


2.动态新增节点
(1).修改master、slave1、slave2、slave3的etc/hadoop/slaves文件,添加新增的节点slave3
[hadoop@h153 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/slaves
h154
h155
h156


(2).在新增的slave3节点执行命令./sbin/hadoop-daemon.sh start datanode启动datanode:
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ ./sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /home/hadoop/hadoop-2.6.0-cdh5.5.2/logs/hadoop-hadoop-datanode-h156.out
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ jps
2898 Jps
2854 DataNode


(3).在新增的slave3节点执行命令./sbin/yarn-daemon.sh start nodemanager启动nodemanager:
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ ./sbin/yarn-daemon.sh start nodemanager
starting nodemanager, logging to /home/hadoop/hadoop-2.6.0-cdh5.5.2/logs/yarn-hadoop-nodemanager-h156.out
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ jps
2854 DataNode
3015 Jps
2952 NodeManager


(4)新增的节点slaves启动了DataNode和NodeManager,实现了动态向集群添加了节点,在浏览器上登录h153:50070查看DataNode information


登录h153:8088查看Nodes of the cluster



二、动态删除节点
1.master配置启用动态删除节点,在etc/hadoop/目录下添加excludes文件,配置需要输出的节点:
[hadoop@h153 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/excludes
h156


2.master修改etc/hadoop/hdfs-site.xml添加如下内容:
[hadoop@h153 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/hdfs-site.xml

<property>        <name>dfs.hosts.exclude</name>        <value>/home/hadoop/hadoop-2.6.0-cdh5.5.2/etc/hadoop/excludes</value></property>
3.master修改mapred-site.xml添加如下内容:
[hadoop@h153 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/mapred-site.xml

<property>        <name>mapred.hosts.exclude</name>        <value>/home/hadoop/hadoop-2.6.0-cdh5.5.2/etc/hadoop/excludes</value>        <final>true</final></property>
4.master修改完这些配置文件后,执行命令命令./bin/hadoop dfsadmin -refreshNodes:
[hadoop@h153 hadoop-2.6.0-cdh5.5.2]$ ./bin/hadoop dfsadmin -refreshNodes

DEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.17/09/16 02:26:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableRefresh nodes successful
5.可用./bin/hadoop dfsadmin -report或web界面查看slave3节点状态变由Normal->decomissioning->Decommissioned。
[hadoop@h153 hadoop-2.6.0-cdh5.5.2]$ ./bin/hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.17/09/16 02:29:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableConfigured Capacity: 37492891648 (34.92 GB)Present Capacity: 31898951680 (29.71 GB)DFS Remaining: 31898517504 (29.71 GB)DFS Used: 434176 (424 KB)DFS Used%: 0.00%Under replicated blocks: 0Blocks with corrupt replicas: 0Missing blocks: 0Missing blocks (with replication factor 1): 0-------------------------------------------------Live datanodes (3):Name: 192.168.205.154:50010 (h154)Hostname: h154Decommission Status : NormalConfigured Capacity: 18746441728 (17.46 GB)DFS Used: 212992 (208 KB)Non DFS Used: 2797916160 (2.61 GB)DFS Remaining: 15948312576 (14.85 GB)DFS Used%: 0.00%DFS Remaining%: 85.07%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 7Last contact: Sat Sep 16 02:29:14 CST 2017Name: 192.168.205.155:50010 (h155)Hostname: h155Decommission Status : NormalConfigured Capacity: 18746441728 (17.46 GB)DFS Used: 212992 (208 KB)Non DFS Used: 2796023808 (2.60 GB)DFS Remaining: 15950204928 (14.85 GB)DFS Used%: 0.00%DFS Remaining%: 85.08%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 7Last contact: Sat Sep 16 02:29:13 CST 2017Name: 192.168.205.156:50010 (h156)Hostname: h156Decommission Status : DecommissionedConfigured Capacity: 18746441728 (17.46 GB)DFS Used: 8192 (8 KB)Non DFS Used: 2058260480 (1.92 GB)DFS Remaining: 16688173056 (15.54 GB)DFS Used%: 0.00%DFS Remaining%: 89.02%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 1Last contact: Sat Sep 16 02:29:14 CST 2017

6.在slave3节点上关闭datanode和nodemanager进程,运行./sbin/hadoop-daemon.sh stop datanode和./sbin/yarn-daemon.sh stop nodemanager:
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ ./sbin/hadoop-daemon.sh stop datanode
stopping datanode
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ ./sbin/yarn-daemon.sh stop nodemanager
stopping nodemanager
[hadoop@h156 hadoop-2.6.0-cdh5.5.2]$ jps
10148 Jps


7../bin/hadoop dfsadmin -report或web界面查看节点状态:(当第6步执行完后并没有马上出现下面的效果,而是等了很长时间)
[hadoop@h153 hadoop-2.6.0-cdh5.5.2]$ ./bin/hadoop dfsadmin -report

DEPRECATED: Use of this script to execute hdfs command is deprecated.Instead use the hdfs command for it.17/09/16 02:52:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableConfigured Capacity: 37492883456 (34.92 GB)Present Capacity: 31898931200 (29.71 GB)DFS Remaining: 31898456064 (29.71 GB)DFS Used: 475136 (464 KB)DFS Used%: 0.00%Under replicated blocks: 17Blocks with corrupt replicas: 0Missing blocks: 0Missing blocks (with replication factor 1): 0-------------------------------------------------Live datanodes (2):Name: 192.168.205.154:50010 (h154)Hostname: h154Decommission Status : NormalConfigured Capacity: 18746441728 (17.46 GB)DFS Used: 237568 (232 KB)Non DFS Used: 2797936640 (2.61 GB)DFS Remaining: 15948267520 (14.85 GB)DFS Used%: 0.00%DFS Remaining%: 85.07%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 7Last contact: Sat Sep 16 02:52:32 CST 2017Name: 192.168.205.155:50010 (h155)Hostname: h155Decommission Status : NormalConfigured Capacity: 18746441728 (17.46 GB)DFS Used: 237568 (232 KB)Non DFS Used: 2796015616 (2.60 GB)DFS Remaining: 15950188544 (14.85 GB)DFS Used%: 0.00%DFS Remaining%: 85.08%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 7Last contact: Sat Sep 16 02:52:34 CST 2017Dead datanodes (1):Name: 192.168.205.156:50010 (h156)Hostname: h156Decommission Status : DecommissionedConfigured Capacity: 0 (0 B)DFS Used: 0 (0 B)Non DFS Used: 0 (0 B)DFS Remaining: 0 (0 B)DFS Used%: 100.00%DFS Remaining%: 0.00%Configured Cache Capacity: 0 (0 B)Cache Used: 0 (0 B)Cache Remaining: 0 (0 B)Cache Used%: 100.00%Cache Remaining%: 0.00%Xceivers: 0Last contact: Sat Sep 16 02:30:56 CST 2017

8.默认balancer的threshold为10%,即各个节点与集群总的存储使用率相差不超过10%,我们可将其设置为5%,然后启动Balancer,sbin/start-balancer.sh -threshold 5,等待集群自均衡完成即可
[hadoop@h153 hadoop-2.6.0-cdh5.5.2]$ sbin/start-balancer.sh -threshold 5
starting balancer, logging to /home/hadoop/hadoop-2.6.0-cdh5.5.2/logs/hadoop-hadoop-balancer-h153.out


三、添加HBase节点(192.168.205.156)
1.准备工作:(和hadoop增加节点的基本一样)

2.把主节点的hbase-1.0.0-cdh5.5.2复制到h156上
[hadoop@h153 ~]$ scp -r hbase-1.0.0-cdh5.5.2/ h156:/home/hadoop/


3.修改所有节点hbase安装目录下conf中的regionservers里的内容,新增h156
[hadoop@h153 ~]$ vi hbase-1.0.0-cdh5.5.2/conf/regionservers
h154
h155
h156

4.在h156上启动regionserver
[hadoop@h156 hbase-1.0.0-cdh5.5.2]$ bin/hbase-daemon.sh start regionserver
[hadoop@h156 hbase-1.0.0-cdh5.5.2]$ jps
2804 HRegionServer
2939 Jps


5.在新启动的节点h156上打开hbase shell,如下设置:
balance_switch true


四、hbase删除HRegionServer(192.168.205.155)
1.方法一
(1).节点关闭之前要先禁用负载均衡,在要关闭的节点上启动hbase shell

hbase(main):003:0> balance_switch status2 servers, 0 dead, 1.0000 average loadtrue                                                                                                                                                                                                                                         0 row(s) in 0.0660 seconds(我刚重启hbase集群后查看是true,可是再查看马上就又成false了)hbase(main):004:0> balance_switch false(改为false)true(原来是true)                                                                 0 row(s) in 0.0700 seconds
(2).在需要删除的RegionServer上执行以下命令./bin/hbase-daemon.sh stop regionserver
[hadoop@h155 ~]$ jps
2707 NodeManager
2915 QuorumPeerMain
11012 Jps
2601 DataNode
3082 HRegionServer
[hadoop@h155 hbase-1.0.0-cdh5.5.2]$ ./bin/hbase-daemon.sh stop regionserver
stopping regionserver....
● region服务器会先关闭所有region,然后把自己停止。(这句话不是很明白)
● 等待zk超时后就会过期。
● master会将这台机器上的region移动到别的机器上
[hadoop@h155 hbase-1.0.0-cdh5.5.2]$ jps
11074 Jps
2707 NodeManager
2915 QuorumPeerMain
2601 DataNode
注意:此节点将会在Zookeeper消失。Master注意到了此RegionServer挂掉了,它将会重新分配掉的这些Region。在停掉一个节点的时候,注意要先关闭Load Balancer即第1步(若已经为false就不用管了),因为Load Balancer可能要和Master的恢复机制争夺停掉的RegionServer  


2.方法二
(1).方法一中region下线会带来一定的服务不可用时间,时间取决于zk的超时。这种方式不是很好。所以最好使用graceful_stop
[hadoop@h155 ~]$ jps
2707 NodeManager
2915 QuorumPeerMain
11012 Jps
2601 DataNode
3082 HRegionServer
[hadoop@h155 hbase-1.0.0-cdh5.5.2]$ ./bin/graceful_stop.sh h155

2017-09-16T03:53:08 Disabling load balancer2017-09-16 03:53:13,443 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available2017-09-16 03:53:15,642 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2017-09-16T03:53:18 Previous balancer state was false2017-09-16T03:53:18 Unloading h155 region(s)2017-09-16 03:53:23,336 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2017-09-16 03:53:25,345 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x3679d92e connecting to ZooKeeper ensemble=h154:2181,h153:2181,h155:21812017-09-16 03:53:25,362 INFO  [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-cdh5.5.2--1, built on 01/25/2016 17:46 GMT2017-09-16 03:53:25,362 INFO  [main] zookeeper.ZooKeeper: Client environment:host.name=h1542017-09-16 03:53:25,362 INFO  [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_912017-09-16 03:53:25,362 INFO  [main] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation2017-09-16 03:53:25,362 INFO  [main] zookeeper.ZooKeeper: Client environment:java.home=/usr/jdk1.8.0_91/jre2017-09-16 03:53:25,362 INFO  [main] zookeeper.ZooKeeper: Client environment:java.class.path=(jar包太多我省略了。。。)2017-09-16 03:53:25,402 INFO  [main] zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/hbase-1.0.0-cdh5.5.2/bin/../lib/native/Linux-amd64-642017-09-16 03:53:25,402 INFO  [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp2017-09-16 03:53:25,403 INFO  [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>2017-09-16 03:53:25,403 INFO  [main] zookeeper.ZooKeeper: Client environment:os.name=Linux2017-09-16 03:53:25,403 INFO  [main] zookeeper.ZooKeeper: Client environment:os.arch=amd642017-09-16 03:53:25,403 INFO  [main] zookeeper.ZooKeeper: Client environment:os.version=3.10.0-327.el7.x86_642017-09-16 03:53:25,408 INFO  [main] zookeeper.ZooKeeper: Client environment:user.name=hadoop2017-09-16 03:53:25,408 INFO  [main] zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop2017-09-16 03:53:25,408 INFO  [main] zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop/hbase-1.0.0-cdh5.5.22017-09-16 03:53:25,409 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=h154:2181,h153:2181,h155:2181 sessionTimeout=90000 watcher=hconnection-0x3679d92e0x0, quorum=h154:2181,h153:2181,h155:2181, baseZNode=/hbase2017-09-16 03:53:25,494 INFO  [main-SendThread(h153:2181)] zookeeper.ClientCnxn: Opening socket connection to server h153/192.168.205.153:2181. Will not attempt to authenticate using SASL (unknown error)2017-09-16 03:53:25,503 INFO  [main-SendThread(h153:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.205.154:40286, server: h153/192.168.205.153:21812017-09-16 03:53:25,521 INFO  [main-SendThread(h153:2181)] zookeeper.ClientCnxn: Session establishment complete on server h153/192.168.205.153:2181, sessionid = 0x15e867a6d3e0003, negotiated timeout = 40000RuntimeError: Server h155:60020 not online    stripServer at /home/hadoop/hbase-1.0.0-cdh5.5.2/bin/region_mover.rb:192  unloadRegions at /home/hadoop/hbase-1.0.0-cdh5.5.2/bin/region_mover.rb:301         (root) at /home/hadoop/hbase-1.0.0-cdh5.5.2/bin/region_mover.rb:4842017-09-16T03:53:26 Unloaded h155 region(s)2017-09-16T03:53:26 Stopping regionserverstopping regionserver..
[hadoop@h155 hbase-1.0.0-cdh5.5.2]$ jps
11074 Jps
2707 NodeManager
2915 QuorumPeerMain
2601 DataNode

(2).由于会关闭hbase的balancer,因此需要在其他节点上打开hbase shell,检查hbase状态,同时重新设置:
hbase(main):001:0> balance_switch true
false 

疑问:当我把这个设为true后,过一会儿马上就又变成false,既然它马上自动变为false,那又何必去管它呢


参考:
http://blog.csdn.net/Mark_LQ/article/details/53393081
https://www.cnblogs.com/zlingh/p/3983984.html

原创粉丝点击