namenode倒换原因分析

来源:互联网 发布:在淘宝好评返现违规吗 编辑:程序博客网 时间:2024/05/29 15:11
<div style="font: 14px/21px 微软雅黑; text-align: left; color: rgb(0, 0, 0); text-transform: none; text-indent: 0px; letter-spacing: normal; word-spacing: 0px; white-space: normal; font-size-adjust: none; font-stretch: normal; background-color: rgb(255, 255, 255); -webkit-text-stroke-width: 0px;"><strong>   一,namenode倒换原因分析  </strong></div><div style="font: 14px/21px 微软雅黑; text-align: left; color: rgb(0, 0, 0); text-transform: none; text-indent: 0px; letter-spacing: normal; word-spacing: 0px; white-space: normal; font-size-adjust: none; font-stretch: normal; background-color: rgb(255, 255, 255); -webkit-text-stroke-width: 0px;">    ZKFC做的HA的HADOOP集群,某信升级网络以后,经常在凌晨出现这种会话超时的情况</div><div style="font: 14px/21px 微软雅黑; text-align: left; color: rgb(0, 0, 0); text-transform: none; text-indent: 0px; letter-spacing: normal; word-spacing: 0px; white-space: normal; font-size-adjust: none; font-stretch: normal; background-color: rgb(255, 255, 255); -webkit-text-stroke-width: 0px;">下面是zkfc的日志,初步认为是网络问题引起的,以前的超时时间为5000ms.</div>
2015-06-23 11:34:53,393 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Caught an exception, leaving main loop due to Socket closed2015-06-23 11:34:53,403 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at M-172-16-189-5/172.16.189.5:8020 active...2015-06-23 11:34:55,672 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at M-172-16-189-5/172.16.189.5:8020 to active state2015-06-24 02:00:10,088 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x14e1e6b5b2d0001, likely server has closed socket, closing socket connection and attempting reconnect2015-06-24 02:00:10,202 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode...2015-06-24 02:00:10,642 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 172.16.189.46/172.16.189.46:2181. Will not attempt to authenticate using SASL (unknown error)2015-06-24 02:00:10,643 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to 172.16.189.46/172.16.189.46:2181, initiating session2015-06-24 02:00:10,647 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x14e1e6b5b2d0001 has expired, closing socket connection2015-06-24 02:00:10,650 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session expired. Entering neutral mode and rejoining...2015-06-24 02:00:10,650 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session2015-06-24 02:00:10,656 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=172.16.189.120:2181,172.16.189.46:2181,172.16.189.134:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@2540d6252015-06-24 02:00:10,694 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 172.16.189.134/172.16.189.134:2181. Will not attempt to authenticate using SASL (unknown error)2015-06-24 02:00:10,696 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to 172.16.189.134/172.16.189.134:2181, initiating session2015-06-24 02:00:10,848 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server 172.16.189.134/172.16.189.134:2181, sessionid = 0x34e1e6b62c90004, negotiated timeout = 50002015-06-24 02:00:10,856 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.2015-06-24 02:00:10,857 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x14e1e6b5b2d00012015-06-24 02:00:10,857 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down2015-06-24 02:00:10,864 INFO org.apache.hadoop.ha.ZKFailoverController: ZK Election indicated that NameNode at M-172-16-189-5/172.16.189.5:8020 should become standby2015-06-24 02:00:10,874 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at M-172-16-189-5/172.16.189.5:8020 to standby state

   二,解决办法
      由于是租用的服务器,网络问题也管不着尝试一下修改zkfc的session超时时间。
     发现在core-site.xml中没有设置,ha.zookeeper.session-timeout.ms,默认为5000ms,果断修改为下面
   
        <property>        <name>ha.zookeeper.session-timeout.ms</name>        <value>10000</value>        <description>ms</description>        </property>
重新启动zkfc。
    修改的时候注意一下zookeeer的配置,客户端设置的值要在这个范围
    #The minimum session timeout in milliseconds that the server will allow the client to negotiate
   minSessionTimeout=2000                                                                                                                                   
   #The maximum session timeout in milliseconds that the server will allow the client to negotiate
    maxSessionTimeout=60000
 

   

0 0
原创粉丝点击