kafka的线程模型之二

来源：互联网发布：淘宝复制链接打不开啊编辑：程序博客网时间：2024/06/06 03:24

上一篇文章介绍了四种kafka的线程.

acceptor线程,负责接收新的tcp连接,并交给network线程.

network线程,负责与客户端或者其他broker的网络通行.

硬盘I/O线程.负责将producer或者consumer的数据,写入读出磁盘.

scheduler线程,定时负责flush磁盘,合并数据,更新index文件.

这篇文章将介介绍, ExpirationReaper 线程以及与之配套的Excecuter-fetch线程.

首先kafka中的kafka.server下可以看到,DelayedFetch,DelayedProduce,DelayedCreateTopics以及DelayedDeleteTopics这四个类,

这些类存在的目的是为了什么? 原因是kafka中有一些操作是无法也不需要同步返回的,需要实现超时返回失败的机制,比如如下的例子:

DelayedFetch 用过kafka-client的同学一定知道,自己在consume某一个topic的时候会设置batch-size.kafka尽量使消息能够批量的传递,在消费某一topic时,producer只要产生任意一条数据,就返回给订阅的consumer显然是不合适的也是低效的. 所以kafka返回数据给consumer会满足其中两个条件一下的一个1.累计到一定的消息大小或者条数.2.fetchRequest请求超过一定的时间阈值没有回应. 来保证consumer的高效.

DelayedProduce,用过kafka-client的同学一定知道.自己在produce某一个topic的时候会设置是否需要ack,分为不需要ack,在partition的leader上存下就返回ack或者在所有的ISR中存下才返回ack这几种情况.在最后一种所有ISR都存下Ack的要求下,broker无法确定在什么时候ISR中的broker会存下消息甚至不能保证ISR可以百分之百存下消息.所以kafka返回Ack给producer就会满足其中两个条件的一个1.所有的ISR都存下了数据.2.有的ISR超过一定时间都没能成功存下数据.

DelayedCreateTopics,kafka会根据规则,安排不同的leader给不同的partition,但是kafka集群的leader无法保证所有的定为partition leader的broker都能按照自己的要求成为leader.

DelayedDeleteTopic.道理同上,集群的leader无法保证follower中的partition leader能成功地删除数据.

就上面几种情况,我们可以知道.kafka需要一种高效的定时机制来完成这些定时任务,因为fetch与produce这两个请求可以说是kafka中最频繁的请求了,要是用朴素的方法比如一个请求一个线程再sleep(timeout)那么系统的资源会被很快用完的.针对这个问题kafka的解决方案便是:时间轮.

kafka的时间轮其实是很有意思的,因为我之前看过netty的时间轮源码,netty的时间轮写的很朴素,看到kafka的时间轮时有一种很惊艳的感觉.但是在这边我不会详细介绍,之后会专门写文章对比两者的时间轮有何异同的,在这里仅仅做简单的介绍.

kafka的时间轮是使用java中的delayQueue作为辅助的,可以理解为n个ExpirationReaper线程,阻塞在delayQueue的poll上,

"ExpirationReaper-12" #53 prio=5 os_prio=0 tid=0x00007fadb4d30000 nid=0x11a7e waiting on condition [0x00007fac1ebee000]   java.lang.Thread.State: TIMED_WAITING (parking)at sun.misc.Unsafe.park(Native Method)- parking to wait for  <0x00000000c88a5e70> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)at java.util.concurrent.DelayQueue.poll(DelayQueue.java:259)at kafka.utils.timer.SystemTimer.advanceClock(Timer.scala:106)at kafka.server.DelayedOperationPurgatory.advanceClock(DelayedOperation.scala:350)at kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper.doWork(DelayedOperation.scala:374)at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

每次delayQueue会把到时间的任务返回给某一线程,这个苏醒的delayQueue负责

1.推进时间轮的刻度

2.判断这一任务有没有被取消,如果没有被取消,那么把ta交给Executor-Fetch线程处理

而Executor-Fetch是一个大小为1的ExecutorPool

Executors.newFixedThreadPool(1, new ThreadFactory() {  def newThread(runnable: Runnable): Thread =    Utils.newThread("executor-"+executorName, runnable, false)})

Executor-Fetch会阻塞在这个threadPool的linkedBlockingQueue上,如下图所示.

"executor-Fetch" #66 prio=5 os_prio=0 tid=0x00007fabe4002000 nid=0x12144 waiting on condition [0x00007fac1dae1000]   java.lang.Thread.State: WAITING (parking)at sun.misc.Unsafe.park(Native Method)- parking to wait for  <0x00000000c8341680> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at java.lang.Thread.run(Thread.java:745)

那么这个Exector-Fetch线程负责什么?这得从上文提到的四个类的共同抽象基类说起

abstract class DelayedOperation(override val delayMs: Long) extends TimerTask with Logging {  private val completed = new AtomicBoolean(false)  /*   * Force completing the delayed operation, if not already completed.   * This function can be triggered when   *   * 1. The operation has been verified to be completable inside tryComplete()   * 2. The operation has expired and hence needs to be completed right now   *   * Return true iff the operation is completed by the caller: note that   * concurrent threads can try to complete the same operation, but only   * the first thread will succeed in completing the operation and return   * true, others will still return false   */  def forceComplete(): Boolean = {    if (completed.compareAndSet(false, true)) {      // cancel the timeout timer      cancel()      onComplete()      true    } else {      false    }  }  /**   * Check if the delayed operation is already completed   */  def isCompleted: Boolean = completed.get()  /**   * Call-back to execute when a delayed operation gets expired and hence forced to complete.   */  def onExpiration(): Unit  /**   * Process for completing an operation; This function needs to be defined   * in subclasses and will be called exactly once in forceComplete()   */  def onComplete(): Unit  /**   * Try to complete the delayed operation by first checking if the operation   * can be completed by now. If yes execute the completion logic by calling   * forceComplete() and return true iff forceComplete returns true; otherwise return false   *   * This function needs to be defined in subclasses   */  def tryComplete(): Boolean  /**   * Thread-safe variant of tryComplete(). This can be overridden if the operation provides its   * own synchronization.   */  def safeTryComplete(): Boolean = {    synchronized {      tryComplete()    }  }  /*   * run() method defines a task that is executed on timeout   */  override def run(): Unit = {    if (forceComplete())      onExpiration()  }}

根据注释可以得知,子类只需要继承这些父类的方法实现相应的逻辑即可.这些逻辑可以概括为: 填充Response的数据,并将response放回到network线程的队列里去,发送的任务就交由network Thread即可.

举一个最典型的例子,DelayFecth,看看他的tryComplete逻辑是什么样的?

override def tryComplete() : Boolean = {  var accumulatedSize = 0  var accumulatedThrottledSize = 0  fetchMetadata.fetchPartitionStatus.foreach {    case (topicAndPartition, fetchStatus) =>      val fetchOffset = fetchStatus.startOffsetMetadata      try {        if (fetchOffset != LogOffsetMetadata.UnknownOffsetMetadata) {          val replica = replicaManager.getLeaderReplicaIfLocal(topicAndPartition.topic, topicAndPartition.partition)          val endOffset =            if (fetchMetadata.fetchOnlyCommitted)              replica.highWatermark            else              replica.logEndOffset          // Go directly to the check for Case D if the message offsets are the same. If the log segment          // has just rolled, then the high watermark offset will remain the same but be on the old segment,          // which would incorrectly be seen as an instance of Case C.          if (endOffset.messageOffset != fetchOffset.messageOffset) {            if (endOffset.onOlderSegment(fetchOffset)) {              // Case C, this can happen when the new fetch operation is on a truncated leader              debug("Satisfying fetch %s since it is fetching later segments of partition %s.".format(fetchMetadata, topicAndPartition))              return forceComplete()            } else if (fetchOffset.onOlderSegment(endOffset)) {              // Case C, this can happen when the fetch operation is falling behind the current segment              // or the partition has just rolled a new segment              debug("Satisfying fetch %s immediately since it is fetching older segments.".format(fetchMetadata))              // We will not force complete the fetch request if a replica should be throttled.              if (!replicaManager.shouldLeaderThrottle(quota, topicAndPartition, fetchMetadata.replicaId))                return forceComplete()            } else if (fetchOffset.messageOffset < endOffset.messageOffset) {              // we take the partition fetch size as upper bound when accumulating the bytes (skip if a throttled partition)              val bytesAvailable = math.min(endOffset.positionDiff(fetchOffset), fetchStatus.fetchInfo.fetchSize)              if (quota.isThrottled(topicAndPartition))                accumulatedThrottledSize += bytesAvailable              else                accumulatedSize += bytesAvailable            }          }        }

其实结束的逻辑很简单,四种情况只要满足任意一种就行了.

1.broker不再是这个partition的leader,这个情况很少发生.

2.broker也没有consumer所请求的数据

3,consumer所请求的offset并不是最新的offset.这也很好理解,delay fetch的目标是为了可以批量传输提高效率,而当consumer不请求最新的数据时,可以把过往的数据批量返回,而不用等待producer发送数据到某一阈值.

4.累计的消息数达到某一事先设定的阈值.

阅读全文

0 0