第三章 QuorumPeer 选举
来源:互联网 发布:大数据实验室解决方案 编辑:程序博客网 时间:2024/05/16 15:20
一、选举模块的创建。
QuorumPeer的start方法中,有调用startLeaderElection来启动选举相关的功能,并且设置默认leader为自身。
synchronized public void startLeaderElection() { try { currentVote = new Vote(myid, getLastLoggedZxid(), getCurrentEpoch()); } catch(IOException e) { RuntimeException re = new RuntimeException(e.getMessage()); re.setStackTrace(e.getStackTrace()); throw re; } for (QuorumServer p : getView().values()) { if (p.id == myid) { myQuorumAddr = p.addr; break; } } if (myQuorumAddr == null) { throw new RuntimeException("My id " + myid + " not in the peer list"); } if (electionType == 0) { try { udpSocket = new DatagramSocket(myQuorumAddr.getPort()); responder = new ResponderThread(); responder.start(); } catch (SocketException e) { throw new RuntimeException(e); } } this.electionAlg = createElectionAlgorithm(electionType); }
二、选举模块的创建,根据上一章讲的启动参数设置选举算法,默认情况下是paxos算法的变种FastLeaderElection实现类。
1、创建LOOKING节点接收Set,已完成选举节点Set,逻辑时钟logicalclock加1,代表自己当前开始了一个新的选举周期,同时,通过updateProposal()方法设置初始议案
2、初始议案默认自己为Leader,广播议案到其他server,进入接收循环。
3、直到选举完成或被强行停止,都循环下面的步骤
4、从接收队列中获取其他服务器的投票信息,会有超时机制。
5、如果在最大超时时间没收到投票,重连,进入下一个循环。
6、如果收到来自集群的投票信息,进入投票信息处理switch
7、如果收到的投票信息来自LOOKING节点,如果对方electionEpoch 大于本机logicalclock,清空投票接收Set,并将对方的设置为当前议案,然后将当前议案广播给其他服务器;如果对方electionEpoch 小于本机logicalclock,忽略投票;如果相等,则根据 epoch、zxid、serverId的顺序比较,大的投票胜出成为议案,然后将当前议案广播给其他服务器;
8、上一步中会将投票信息放入recvSet,现在判断其中能否选出Leader,通过QuorumVerifier的containsQuorum方法可以判断。默认是超过一半的服务器选议案中的vote,如果能选举出进入下一步。否则进入下一个循环
9、如果上一步recvSet成功选出Leader,以两百毫秒的超时poll 接收队列中的投票,如果有更新的,放到接收队列中,并跳出选举,进入下一个循环。
10、如果第九步没有执行,此时已经可以判断出leader了,根据选举结果设置当前服务器状态,清除接收队列,返回leader节点。
11、如果收到的投票信息来自Following或者Leader节点。
12、如果投票信息与当前logicalclock一致、将投票信息放到接收Set,判断LOOKING投票接收Set中是不是大部分服务器都选举该投票中的节点。并且该节点自己也已经成为Leader状态,满足的话就承认该投票为Leader。不满足进入下一步。
13、将投票放入已完成选举节点Set,判断已完成选举节点Set中是不是大部分服务器同意选举投票中的节点,并且投票节点也自认为是Leader,满足的话承认Leader完成选举,否则进入下一个循环
呕心沥血画了个图
HashMap<Long, Vote> recvset = new HashMap<Long, Vote>(); HashMap<Long, Vote> outofelection = new HashMap<Long, Vote>();int notTimeout = finalizeWait; synchronized(this){ logicalclock++; updateProposal(getInitId(), getInitLastLoggedZxid(), getPeerEpoch()); } LOG.info("New election. My id = " + self.getId() + ", proposed zxid=0x" + Long.toHexString(proposedZxid)); sendNotifications(); /* * Loop in which we exchange notifications until we find a leader */ while ((self.getPeerState() == ServerState.LOOKING) && (!stop)){ /* * Remove next notification from queue, times out after 2 times * the termination time */ Notification n = recvqueue.poll(notTimeout, TimeUnit.MILLISECONDS); /* * Sends more notifications if haven't received enough. * Otherwise processes new notification. */ if(n == null){ if(manager.haveDelivered()){ sendNotifications(); } else { manager.connectAll(); } /* * Exponential backoff */ int tmpTimeOut = notTimeout*2; notTimeout = (tmpTimeOut < maxNotificationInterval? tmpTimeOut : maxNotificationInterval); LOG.info("Notification time out: " + notTimeout); } else if(self.getVotingView().containsKey(n.sid)) { /* * Only proceed if the vote comes from a replica in the * voting view. */ switch (n.state) { case LOOKING: // If notification > current, replace and send messages out if (n.electionEpoch > logicalclock) { logicalclock = n.electionEpoch; recvset.clear(); if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch, getInitId(), getInitLastLoggedZxid(), getPeerEpoch())) { updateProposal(n.leader, n.zxid, n.peerEpoch); } else { updateProposal(getInitId(), getInitLastLoggedZxid(), getPeerEpoch()); } sendNotifications(); } else if (n.electionEpoch < logicalclock) { if(LOG.isDebugEnabled()){ LOG.debug("Notification election epoch is smaller than logicalclock. n.electionEpoch = 0x" + Long.toHexString(n.electionEpoch) + ", logicalclock=0x" + Long.toHexString(logicalclock)); } break; } else if (totalOrderPredicate(n.leader, n.zxid, n.peerEpoch, proposedLeader, proposedZxid, proposedEpoch)) { updateProposal(n.leader, n.zxid, n.peerEpoch); sendNotifications(); } if(LOG.isDebugEnabled()){ LOG.debug("Adding vote: from=" + n.sid + ", proposed leader=" + n.leader + ", proposed zxid=0x" + Long.toHexString(n.zxid) + ", proposed election epoch=0x" + Long.toHexString(n.electionEpoch)); } recvset.put(n.sid, new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch)); if (termPredicate(recvset, new Vote(proposedLeader, proposedZxid, logicalclock, proposedEpoch))) { // Verify if there is any change in the proposed leader while((n = recvqueue.poll(finalizeWait, TimeUnit.MILLISECONDS)) != null){ if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch, proposedLeader, proposedZxid, proposedEpoch)){ recvqueue.put(n); break; } } /* * This predicate is true once we don't read any new * relevant message from the reception queue */ if (n == null) { self.setPeerState((proposedLeader == self.getId()) ? ServerState.LEADING: learningState()); Vote endVote = new Vote(proposedLeader, proposedZxid, logicalclock, proposedEpoch); leaveInstance(endVote); return endVote; } } break; case OBSERVING: LOG.debug("Notification from observer: " + n.sid); break; case FOLLOWING: case LEADING: /* * Consider all notifications from the same epoch * together. */ if(n.electionEpoch == logicalclock){ recvset.put(n.sid, new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch)); if(ooePredicate(recvset, outofelection, n)) { self.setPeerState((n.leader == self.getId()) ? ServerState.LEADING: learningState()); Vote endVote = new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch); leaveInstance(endVote); return endVote; } } /* * Before joining an established ensemble, verify * a majority is following the same leader. */ outofelection.put(n.sid, new Vote(n.version, n.leader, n.zxid, n.electionEpoch, n.peerEpoch, n.state)); if(ooePredicate(outofelection, outofelection, n)) { synchronized(this){ logicalclock = n.electionEpoch; self.setPeerState((n.leader == self.getId()) ? ServerState.LEADING: learningState()); } Vote endVote = new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch); leaveInstance(endVote); return endVote; } break; default: LOG.warn("Notification state unrecognized: {} (n.state), {} (n.sid)", n.state, n.sid); break; } } else { LOG.warn("Ignoring notification from non-cluster member " + n.sid); } } return null;
QuorumCnxManager是管理选举中所使用的链接的
其中端口通过server.x=[hostname]:n:n 中的第二个n来设置。而且是serverId大的向serverid小的发起链接,serverId小的则只需要accept,这样可以减少使用的链接数。
- 第三章 QuorumPeer 选举
- QuorumPeer启动恢复数据
- 选举
- 选举
- 选举
- 暑假集训第三周 STL H - Election 选举
- 提前选举
- 选举地图
- 选举算法
- 选举计数
- 总统选举
- Master选举
- 投票选举
- 选举问题
- oj-选举
- 选举游戏
- 选举游戏
- ZooKeeper_14_Leader选举
- 1003个微生物基因组数据发布
- Android蓝牙BLE的API翻译(一)
- java 有n个整数,使其前面各数顺序向后移m个位置,最后m个数变成最前面的m个数
- 设计模式--装饰模式
- java string reverse
- 第三章 QuorumPeer 选举
- java 有n个人围成一圈,顺序排号。从第一个人开始报数(从1到3报数),凡报到3的人退出圈子,问最后留下的是原来第几号的那位。
- Windows下Mysql 备份与恢复
- C++ string assign()赋值常用方法
- Echarts图表通过Ajax动态更新数据
- Struts2和Spring
- java equals ==
- 正则表达式
- 强化学习各种算法分析及Eligibility Trace教程