“格式化HDFS后,HMaster进程启动失败”的问题解决

来源:互联网 发布:炒股入门 知乎 编辑:程序博客网 时间:2024/04/30 09:46


用 hadoop namenode -fromat 格式化后,用./start-hbase.sh 启动HMaster和HRegionServer,但是过几秒种后HMaster进程自动关闭,HRegionServer进程没有关闭,查看日志报如下错: 

2015-04-08 10:49:12,164 INFO  [worker01:16020.activeMasterManager] master.MasterFileSystem: Log folder hdfs://ns1/hbase/WALs/worker05,16020,1428461295266 belongs to an existing region server
2015-04-08 10:49:12,177 INFO  [worker01:16020.activeMasterManager] master.MasterFileSystem: Log folder hdfs://ns1/hbase/WALs/worker06,16020,1428461284585 belongs to an existing region server
2015-04-08 10:49:12,180 INFO  [worker01:16020.activeMasterManager] master.MasterFileSystem: Log folder hdfs://ns1/hbase/WALs/worker07,16020,1428461270920 belongs to an existing region server
2015-04-08 10:49:12,300 INFO  [worker01:16020.activeMasterManager] zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at address=worker05,16020,1428456823337, exception=org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is not online on worker05,16020,1428461295266
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2740)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:859)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1137)
        at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:20862)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
        at java.lang.Thread.run(Thread.java:745)

2015-04-08 10:49:12,305 INFO  [worker01:16020.activeMasterManager] master.MasterFileSystem: Log dir for server worker05,16020,1428456823337 does not exist
2015-04-08 10:49:12,305 INFO  [worker01:16020.activeMasterManager] master.SplitLogManager: dead splitlog workers [worker05,16020,1428456823337]
2015-04-08 10:49:12,306 INFO  [worker01:16020.activeMasterManager] master.SplitLogManager: started splitting 0 logs in [] for [worker05,16020,1428456823337]
2015-04-08 10:49:12,322 INFO  [worker01:16020.activeMasterManager] master.SplitLogManager: finished splitting (more than or equal to) 0 bytes in 0 log files in [] in 16ms
2015-04-08 10:49:12,323 INFO  [worker01:16020.activeMasterManager] zookeeper.MetaTableLocator: Deleting hbase:meta region location in ZooKeeper
2015-04-08 10:49:12,361 INFO  [worker01:16020.activeMasterManager] master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to worker07,16020,1428461270920
2015-04-08 10:49:12,361 INFO  [worker01:16020.activeMasterManager] master.RegionStates: Transition {1588230740 state=OFFLINE, ts=1428461352349, server=null} to {1588230740 state=PENDING_OPEN, ts=1428461352361, server=worker07,16020,1428461270920}
2015-04-08 10:49:12,435 INFO  [worker01:16020.activeMasterManager] master.ServerManager: AssignmentManager hasn't finished failover cleanup; waiting
2015-04-08 10:49:12,503 INFO  [AM.ZK.Worker-pool3-t1] master.RegionStates: Transition {1588230740 state=PENDING_OPEN, ts=1428461352361, server=worker07,16020,1428461270920} to {1588230740 state=OPENING, ts=1428461352503, server=worker07,16020,1428461270920}
2015-04-08 10:49:13,096 INFO  [AM.ZK.Worker-pool3-t2] master.RegionStates: Transition {1588230740 state=OPENING, ts=1428461352503, server=worker07,16020,1428461270920} to {1588230740 state=OPEN, ts=1428461353096, server=worker07,16020,1428461270920}
2015-04-08 10:49:13,101 INFO  [AM.ZK.Worker-pool3-t2] coordination.ZkOpenRegionCoordination: Handling OPENED of 1588230740 from worker01,16020,1428461337650; deleting unassigned node
2015-04-08 10:49:13,110 INFO  [AM.ZK.Worker-pool3-t3] master.RegionStates: Onlined 1588230740 on worker07,16020,1428461270920
2015-04-08 10:49:13,111 INFO  [worker01:16020.activeMasterManager] master.HMaster: hbase:meta assigned=1, rit=false, location=worker07,16020,1428461270920
2015-04-08 10:49:13,206 INFO  [worker01:16020.activeMasterManager] hbase.MetaMigrationConvertingToPB: hbase:meta doesn't have any entries to update.
2015-04-08 10:49:13,206 INFO  [worker01:16020.activeMasterManager] hbase.MetaMigrationConvertingToPB: META already up-to date with PB serialization
2015-04-08 10:49:13,224 INFO  [worker01:16020.activeMasterManager] master.AssignmentManager: Clean cluster startup. Assigning user regions
2015-04-08 10:49:13,230 INFO  [worker01:16020.activeMasterManager] master.AssignmentManager: Joined the cluster in 24ms, failover=false
2015-04-08 10:49:13,247 INFO  [worker01:16020.activeMasterManager] master.TableNamespaceManager: Namespace table not found. Creating...
2015-04-08 10:49:13,296 FATAL [worker01:16020.activeMasterManager] master.HMaster: Failed to become active master
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
        at org.apache.hadoop.hbase.master.handler.CreateTableHandler.checkAndSetEnablingTable(CreateTableHandler.java:151)
        at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:124)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:233)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:868)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:719)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:165)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1425)
        at java.lang.Thread.run(Thread.java:745)
2015-04-08 10:49:13,298 FATAL [worker01:16020.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: []
2015-04-08 10:49:13,298 FATAL [worker01:16020.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown.
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
        at org.apache.hadoop.hbase.master.handler.CreateTableHandler.checkAndSetEnablingTable(CreateTableHandler.java:151)
        at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:124)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:233)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:868)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:719)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:165)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1425)
        at java.lang.Thread.run(Thread.java:745)
2015-04-08 10:49:13,298 INFO  [worker01:16020.activeMasterManager] regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown.
2015-04-08 10:49:13,298 INFO  [master/worker01/192.168.217.11:16020] regionserver.HRegionServer: Stopping infoServer
2015-04-08 10:49:13,312 INFO  [master/worker01/192.168.217.11:16020] mortbay.log: StoppedSelectChannelConnector@0.0.0.0:16030
2015-04-08 10:49:13,415 INFO  [master/worker01/192.168.217.11:16020] regionserver.HRegionServer: stopping server worker01,16020,1428461337650
2015-04-08 10:49:13,415 INFO  [master/worker01/192.168.217.11:16020] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x24c969390e4001e
2015-04-08 10:49:13,430 INFO  [master/worker01/192.168.217.11:16020-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-04-08 10:49:13,430 INFO  [master/worker01/192.168.217.11:16020] zookeeper.ZooKeeper: Session: 0x24c969390e4001e closed
2015-04-08 10:49:13,431 INFO  [master/worker01/192.168.217.11:16020] regionserver.HRegionServer: stopping server worker01,16020,1428461337650; all regions closed.
2015-04-08 10:49:13,431 INFO  [master/worker01/192.168.217.11:16020] master.HMaster: Stopping master jetty server
2015-04-08 10:49:13,431 INFO  [master/worker01/192.168.217.11:16020] mortbay.log: StoppedSelectChannelConnector@0.0.0.0:16010
2015-04-08 10:49:13,433 INFO  [worker01,16020,1428461337650-BalancerChore] balancer.BalancerChore: worker01,16020,1428461337650-BalancerChore exiting
2015-04-08 10:49:13,433 INFO  [worker01,16020,1428461337650-ClusterStatusChore] balancer.ClusterStatusChore: worker01,16020,1428461337650-ClusterStatusChore exiting
2015-04-08 10:49:13,433 INFO  [CatalogJanitor-worker01:16020] master.CatalogJanitor: CatalogJanitor-worker01:16020 exiting
2015-04-08 10:49:13,434 INFO  [worker01:16020.oldLogCleaner] cleaner.LogCleaner: worker01:16020.oldLogCleaner exiting
2015-04-08 10:49:13,434 INFO  [worker01:16020.oldLogCleaner] master.ReplicationLogCleaner: Stopping replicationLogCleaner-0x14c9693ba2f001f, quorum=worker06:2181,worker05:2181,worker07:2181, baseZNode=/hbase
2015-04-08 10:49:13,435 INFO  [worker01:16020.archivedHFileCleaner] cleaner.HFileCleaner: worker01:16020.archivedHFileCleaner exiting
2015-04-08 10:49:13,504 INFO  [worker01:16020.oldLogCleaner] zookeeper.ZooKeeper: Session: 0x14c9693ba2f001f closed
2015-04-08 10:49:13,505 INFO  [worker01:16020.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-04-08 10:49:13,608 INFO  [master/worker01/192.168.217.11:16020] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x14c9693ba2f001e
2015-04-08 10:49:13,761 INFO  [worker01:16020.activeMasterManager-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-04-08 10:49:13,762 INFO  [master/worker01/192.168.217.11:16020] zookeeper.ZooKeeper: Session: 0x14c9693ba2f001e closed
2015-04-08 10:49:13,847 INFO  [worker01,16020,1428461337650.splitLogManagerTimeoutMonitor] master.SplitLogManager$TimeoutMonitor: worker01,16020,1428461337650.splitLogManagerTimeoutMonitor exiting
2015-04-08 10:49:13,848 INFO  [master/worker01/192.168.217.11:16020] flush.MasterFlushTableProcedureManager: stop: server shutting down.
2015-04-08 10:49:13,848 INFO  [master/worker01/192.168.217.11:16020] ipc.RpcServer: Stopping server on 16020
2015-04-08 10:49:13,848 INFO  [RpcServer.listener,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: stopping
2015-04-08 10:49:13,855 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped
2015-04-08 10:49:13,855 INFO  [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping
2015-04-08 10:49:13,945 INFO  [master/worker01/192.168.217.11:16020] zookeeper.RecoverableZooKeeper: Node /hbase/rs/worker01,16020,1428461337650 already deleted, retry=false
2015-04-08 10:49:13,949 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-04-08 10:49:13,950 INFO  [master/worker01/192.168.217.11:16020] zookeeper.ZooKeeper: Session: 0x14c9693ba2f001d closed
2015-04-08 10:49:13,950 INFO  [master/worker01/192.168.217.11:16020] regionserver.HRegionServer: stopping server worker01,16020,1428461337650; zookeeper connection closed.
2015-04-08 10:49:13,950 INFO  [master/worker01/192.168.217.11:16020] regionserver.HRegionServer: master/worker01/192.168.217.11:16020 exiting
2015-04-08 10:49:13,952 INFO  [Shutdown] mortbay.log: Shutdown hook executing
2015-04-08 10:49:13,952 INFO  [Shutdown] mortbay.log: Shutdown hook complete

我网上查了很多资料,都没有找到答案。最后想到这是因为我格式化了HDFS后产生的问题,原来是好的。是不是因为zookeeper中某些文件遗留的原因导致的呢?所以我将dataDir=/opt/zookeeper-3.4.5/data 目录下的version-2给删除,并且重启了服务器。然后重启开启后,HMaster进程正常运行。再打开 dataDir=/opt/zookeeper-3.4.5/data目录发现产生了新的version-2  ,并且多了 zookeeper_server.pid这个文件。我不知道当时是否是因为缺少zookeeper_server.pid这个文件导致的原因呢。

所以发现格式化HDFS还是可能会产生很多意想不到的问题的。所以今天产生的这些问题告诉我,重新格式化之前,最好先关闭相关进程,并且删除hadoop下的tmp目录(master和slave都要删),并且zookeeper下的dataDir下的文件也要注意,然后严格按照下面步骤格式化:

 ###注意:严格按照下面的步骤
  2.5启动zookeeper集群(分别在itcast05、itcast06、tcast07上启动zk)
   cd /itcast/zookeeper-3.4.5/bin/
   ./zkServer.sh start
   #查看状态:一个leader,两个follower
   ./zkServer.sh status
  2.6启动journalnode(分别在在itcast05、itcast06、tcast07上执行)
   cd /itcast/hadoop-2.4.1
   sbin/hadoop-daemon.sh start journalnode
   #运行jps命令检验,itcast05、itcast06、itcast07上多了JournalNode进程
  2.7格式化HDFS
   #在itcast01上执行命令:
   hdfs namenode -format
   #格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是/itcast/hadoop-2.4.1/tmp,然后将/itcast/hadoop-2.4.1/tmp拷贝到itcast02的/itcast/hadoop-2.4.1/下。
   scp -r tmp/ itcast02:/itcast/hadoop-2.4.1/
  2.8格式化ZK(在itcast01上执行即可)
   hdfs zkfc -formatZK
  2.9启动HDFS(在itcast01上执行)
   sbin/start-dfs.sh

  2.10启动YARN(#####注意#####:是在itcast03上执行start-yarn.sh,把namenode和resourcemanager分开是因为性能问题,因为他们都要占用大量资源,所以把他们分开了,他们分开了就要分别在不同的机器上启动)
   sbin/start-yarn.sh

0 0
原创粉丝点击