zookeeper leader和follower启动期交互过程

来源:互联网 发布:php技术总监 招聘 编辑:程序博客网 时间:2024/06/05 18:10

leader和follower启动期交互过程包括如下步骤。

1.创建Leader服务器和Follower服务器

完成leader选举后,各服务器会根据自己角色创建相应的服务器实例,并开始进入各自角色的主流程。

代码位置:QuorumPeer.java

  • leader

                    case LEADING:                LOG.info("LEADING");                try {                    setLeader(makeLeader(logFactory));                    leader.lead();                    setLeader(null);                }
  • follower

                case FOLLOWING:                try {                    LOG.info("FOLLOWING");                    setFollower(makeFollower(logFactory));                    follower.followLeader();                }

2.leader服务器启动follower接收器LearnerCnxAcceptor。

LearnerCnxAcceptor接收器用于负责接收所有非leader服务器的连接请求。

Leader的lead过程

    void lead() throws IOException, InterruptedException {           ...           self.tick = 0;              //从本地文件恢复数据             zk.loadData();             //leader的状态信息             leaderStateSummary = new StateSummary(self.getCurrentEpoch(), zk.getLastProcessedZxid());             // Start thread that waits for connection requests from              // new followers.              //启动lead端口的监听线程,专门用来监听新的follower             cnxAcceptor = new LearnerCnxAcceptor();             cnxAcceptor.start();             readyToStart = true;              //等待足够多的follower进来,代表自己确实是leader,此处lead线程可能会等待             long epoch = getEpochToPropose(self.getId(), self.getAcceptedEpoch());              ... 

3.learner服务器开始和leader建立连接

所有的learner服务器在启动完毕后,会从leader选举结果中找到leader服务器,然后与其建立连接。

4.leader服务器创建LearnerHandler

leader接收到learner的连接创建请求后,会创建一个LearnerHandler。每一个LearnerHandler实例对应一个leader与learner服务器之间的连接,其负责leader与learner间几乎所有的消息通信和数据同步。

LearnerCnxAcceptor的线程

public void run() {             try {                 while (!stop) {                     try{                      //线程在此等待连接                         Socket s = ss.accept();                         // start with the initLimit, once the ack is processed                         // in LearnerHandler switch to the syncLimit                      //读超时设为initLimit时间                         s.setSoTimeout(self.tickTime * self.initLimit);                         s.setTcpNoDelay(nodelay);                      //为每个follower启动单独线程,处理IO                         LearnerHandler fh = new LearnerHandler(s, Leader.this);                         fh.start();                     } catch (SocketException e) {  .....         }  

5.向Leader注册

当和leader建立起连接后,learner就会开始向leader进行注册,其实就是将learner服务器的自身信息发给leader服务器,我们称之为LearnerInfo,包括当前服务器的SID和服务器处理的最新的ZXID。

6.Leader解析Learner信息,计算新的epoch

leader服务器收到LearnerInfo后,会解析出该learner的SID和ZXID,根据该learner的ZXID解析出其对应的epoch_of_learner,和当前Leader服务器的epoch_of_learner进行比较,如果该learner的epoch_of_learner更大,那么就更新leader的epoch:
  epoch_of_leader = epoch_of_learner + 1
然后,LearnerHandler会进行等待(阻塞在getEpochToPropose函数),直到过半的learner进行了注册,同时更新了epoch_of_leader之后,leader就可以确定当前集群的epoch了。

LearnerHandler的线程

    public void run() {        try {            ...            ia = BinaryInputArchive.getArchive(bufferedInput);            ...            //IO线程等待follower发送包(LearnerInfo)            QuorumPacket qp = new QuorumPacket();            ia.readRecord(qp, "packet");            ...            byte learnerInfoData[] = qp.getData();            if (learnerInfoData != null) {                if (learnerInfoData.length == 8) {                    ByteBuffer bbsid = ByteBuffer.wrap(learnerInfoData);                    this.sid = bbsid.getLong();                } else {                    //反序列化LearnerInfo                     LearnerInfo li = new LearnerInfo();                    ByteBufferInputStream.byteBuffer2Record(ByteBuffer.wrap(learnerInfoData), li);                    this.sid = li.getServerid();                    this.version = li.getProtocolVersion();                }            } else {                this.sid = leader.followerCounter.getAndDecrement();            }            ...            long lastAcceptedEpoch = ZxidUtils.getEpochFromZxid(qp.getZxid());            long peerLastZxid;            StateSummary ss = null;            long zxid = qp.getZxid();            //直到过半的learner进行了注册            long newEpoch = leader.getEpochToPropose(this.getSid(), lastAcceptedEpoch);
public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException, IOException {          synchronized(connectingFollowers) {              if (!waitingForNewEpoch) {                  return epoch;              }              if (lastAcceptedEpoch >= epoch) {                  epoch = lastAcceptedEpoch+1;              }          //将自己加入连接队伍中,方便后续判断lead是否有效              connectingFollowers.add(sid);              QuorumVerifier verifier = self.getQuorumVerifier();          //如果足够多的follower进入,选举有效,则无需等待,并通知其他的等待线程,类似于Barrier              if (connectingFollowers.contains(self.getId()) &&                                               verifier.containsQuorum(connectingFollowers)) {                  waitingForNewEpoch = false;                  self.setAcceptedEpoch(epoch);                  connectingFollowers.notifyAll();              }           //如果进入的follower不够,则进入等待,超时即为initLimit时间,          else {                  long start = System.currentTimeMillis();                  long cur = start;                  long end = start + self.getInitLimit()*self.getTickTime();                  while(waitingForNewEpoch && cur < end) {                      connectingFollowers.wait(end - cur);                      cur = System.currentTimeMillis();                  }          //超时了,退出lead过程,重新发起选举                  if (waitingForNewEpoch) {                      throw new InterruptedException("Timeout while waiting for epoch from quorum");                          }              }              return epoch;          }      }  

7.发送Leader状态

计算出新的epoch,leader会将该信息以一个LEADERINFO消息的形式发给learner,同时等待learner响应。

同样在LearnerHandler的线程中

            ......              //发一个Leader.LEADERINFO包,带上新的epoch id                  byte ver[] = new byte[4];                  ByteBuffer.wrap(ver).putInt(0x10000);                  QuorumPacket newEpochPacket = new QuorumPacket(Leader.LEADERINFO, ZxidUtils.makeZxid(newEpoch, 0), ver, null);                  oa.writeRecord(newEpochPacket, "packet");                  bufferedOutput.flush();                  QuorumPacket ackEpochPacket = new QuorumPacket();              //等待follower响应,对应9.中的接收到ack消息                ia.readRecord(ackEpochPacket, "packet");                  if (ackEpochPacket.getType() != Leader.ACKEPOCH) {                      LOG.error(ackEpochPacket.toString()                              + " is not ACKEPOCH");                      return;                  }                  ByteBuffer bbepoch = ByteBuffer.wrap(ackEpochPacket.getData());                  ss = new StateSummary(bbepoch.getInt(), ackEpochPacket.getZxid());                  //注意这里哦,也是过半阻塞,必须要收到集群中过半机器的EpochAck才能继续                leader.waitForEpochAck(this.getSid(), ss);              }  
  public void waitForEpochAck(long id, StateSummary ss) throws IOException, InterruptedException {        synchronized(electingFollowers) {            if (electionFinished) {                return;            }            if (ss.getCurrentEpoch() != -1) {                ......  //将follower添加到等待集合                electingFollowers.add(id);            }            QuorumVerifier verifier = self.getQuorumVerifier();     //判断是否满足选举条件,如果不满足进入等待,满足则通知其他等待线程,类似于Barrier            if (electingFollowers.contains(self.getId()) && verifier.containsQuorum(electingFollowers)) {                electionFinished = true;                electingFollowers.notifyAll();            }     //follower还不够,等等吧     else {                                long start = System.currentTimeMillis();                long cur = start;                long end = start + self.getInitLimit()*self.getTickTime();                while(!electionFinished && cur < end) {                    electingFollowers.wait(end - cur);                    cur = System.currentTimeMillis();                }                if (!electionFinished) {                    throw new InterruptedException("Timeout while waiting for epoch to be acked by quorum");                }            }        }    } 

8.Learner发生ACK消息

follower在收到来自leader的LEADERINFO消息后,会解析出epoch和ZXID,然后向Leader反馈一个ACKEPOCH响应。

9.数据同步

leader服务器接收到learner的这个ACK消息后(对应的代码在7.中的等待follower响应),就可以开始与其进行数据同步了,关于集群间的数据同步我们将在下一篇分析。

10.启动leader和learner服务器

当有过半的learner已经完成了数据同步,那么leader和learner服务器实例就可以开始启动了。

原创粉丝点击