hbase全分布式遇到的问题1--忘记关闭防火墙
来源:互联网 发布:知世鼓励小狼 编辑:程序博客网 时间:2024/04/20 05:31
近两天学习hbase的全分布式搭建,因为好几个地方疏忽,几个问题同时出现,着实费了好些时间才理清,为了方便理解,问题解决后每次只重现一个错误,分别记录,本篇是关于防火墙的。
之前记得在学hadoop的时候所有节点的防火墙就已经关好了,所以这个问题刚开始的时候压根就没往上考虑过,上网查了好久发现有相同经历的文章才去核实。
现象:start-abase.sh执行能看到hmaster进程打开,但是用web UI访问不了 http://<masternode>:16010(我这里masternode是hadoop.lsd1.com,后续不再重述);并且一段时间后所有节点的hmaster和hregionserver都挂掉;查master节点的日志hba se-root-master-hadoop.lsd1.com.log有如下错误:
2017-03-13 12:02:29,850 INFO [main-SendThread(hadoop.lsd3.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server hadoop.lsd3.com/192.168.56.13:2181. Will not attempt to authenticate using SASL (unknown error)
2017-03-13 12:02:29,851 WARN [main-SendThread(hadoop.lsd3.com:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: 没有到主机的路由
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-03-13 12:02:30,934 INFO [main-SendThread(hadoop.lsd1.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server hadoop.lsd1.com/192.168.56.11:2181. Will not attempt to authenticate using SASL (unknown error)
2017-03-13 12:02:30,935 INFO [main-SendThread(hadoop.lsd1.com:2181)] zookeeper.ClientCnxn: Socket connection established to hadoop.lsd1.com/192.168.56.11:2181, initiating session
2017-03-13 12:02:30,938 INFO [main-SendThread(hadoop.lsd1.com:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2017-03-13 12:02:31,038 ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper create failed after 4 attempts
2017-03-13 12:02:31,040 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster.
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2426)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:231)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:137)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2436)
Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: master:160000x0, quorum=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181, baseZNode=/hbase Unexpected KeeperException creating base node
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:206)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:187)
at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:585)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:381)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2419)
... 5 more
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:565)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:544)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1204)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1182)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:194)
... 13 more
[root@hadoop hbase-1.2.4]#
这看起来是zookeeper在通信的时候遇到问题了,其实像这类网络问题可能的原因应该不止是防火墙没关,这里我不好说,只能说就我遇到的来讲,防火墙算是一种原因;后续的实验中把其他节点的防火墙关闭掉,单单打开某一个regionserver节点的防火墙,也会导致全部的hmaster和hregionserver挂掉(这有点费解,难道hbase不允许节点故障吗?);如果是启动的时候防火墙关闭,在启动成功后再打开某个节点的防火墙(包括master节点也是),却并不会导致集群退出,只是该节点无法访问,并且在再次关闭该节点后都是可以恢复访问的;
解决方法:
在所有的节点核实一下防火墙是否关闭
service frewalld status
如果没有关闭,关闭掉,并且禁用
systemctl stop firewalls.service
systemctl disable firewalls.service
然后重启hbase,这个问题已经解决,web UI能访问到16010端口了,剩下还有其他的问题另开一篇描述。
总结:如果start-hbase.sh运行后能用jps查看到hmaster进程正常,但是用web UI又访问不到master,并且查看日志有类似上述的connect 错误时,可以考虑一下是否防火墙没关好;当然hmaster和hregionserver进程也很可能在过一小段时间后全挂掉,所以主要还是要查看日志来判断。
- hbase全分布式遇到的问题1--忘记关闭防火墙
- hbase全分布式遇到的问题2--集群时间不同步
- hbase全分布式遇到的问题3--集群中有没有配置好ip映射的节点
- Hbase 开启关闭遇到的一些问题记录 HregionServer 进程关不掉
- hbase 搭建遇到的问题
- Storm HBase遇到的问题
- mac本地搭建伪分布式Hadoop和HBase遇到的问题
- centos 7 中防火墙的关闭问题
- centos 7 中防火墙的关闭问题
- centos 7 中防火墙的关闭问题
- centos 7 中防火墙的关闭问题
- 安装Hbase(分布式)遇到一些问题及解决方法
- allegro 遇到的问题汇总 避免忘记
- allegro 遇到的问题汇总 避免忘记
- allegro 遇到的问题汇总 避免忘记
- HBase(1)-HBase的分布式安装
- 在hadoop全分布式配置过程中编辑/etc/hosts时遇到权限的问题
- HBase全分布式环境搭建
- Linux环境下proc的配置c/c++操作数据库简单示例
- 一道题浅析 i++,++i,i+1及(引用)&i的区别
- Android之MediaProjection的简介
- 深度剖析fork()的原理及用法
- 浅析三种特殊进程:孤儿进程,僵尸进程和守护进程.
- hbase全分布式遇到的问题1--忘记关闭防火墙
- linux中gdb的可视化调试
- tyvj 1753 [SCOI2005] 最大子矩阵
- SQLite CodeFirst、Migration 的趟坑过程 [附源码]
- Ubuntu 16.04下安装JDK教程
- 网易游戏2017互娱实习笔试编程(竖式填空)
- 线程配合及Timer TimerTask理解
- 批量Kill多个进程的方法
- 求海岛周长