kafka线程模型之三 QuotaManager

来源:互联网 发布:mysql insert慢 io 编辑:程序博客网 时间:2024/05/17 21:44

kafka从0.9版本之后引入了配额机制,对于每个producer或者consumer,可以对他们produce或者consum的速度上限作出限制.这边找到一篇不错的文档,可以解释这一配额的设计理念与实现方式.

kafka配额控制

这其实对应着kafka中的两个线程,限制producer的线程,以及限制consumer的线程.

"ThrottledRequestReaper-Fetch" #24 prio=5 os_prio=0 tid=0x00007fadb4ad2800 nid=0x11a61 waiting on condition [0x00007fac97ffe000]   java.lang.Thread.State: TIMED_WAITING (parking)at sun.misc.Unsafe.park(Native Method)- parking to wait for  <0x00000000c86003a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)at java.util.concurrent.DelayQueue.poll(DelayQueue.java:259)at kafka.server.ClientQuotaManager$ThrottledRequestReaper.doWork(ClientQuotaManager.scala:158)at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

"ThrottledRequestReaper-Produce" #25 prio=5 os_prio=0 tid=0x00007fadb4ad7000 nid=0x11a62 waiting on condition [0x00007fac97efd000]   java.lang.Thread.State: TIMED_WAITING (parking)at sun.misc.Unsafe.park(Native Method)- parking to wait for  <0x00000000c85e33c8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)at java.util.concurrent.DelayQueue.poll(DelayQueue.java:259)at kafka.server.ClientQuotaManager$ThrottledRequestReaper.doWork(ClientQuotaManager.scala:158)at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

我认为QuatoManager是很有学习意义的.起码从两个方面,一是对于kafka这样请求回复式的结构,如何尽可能优雅地实现节流. 二是如何能尽可能优雅地监测一个client的流量.

关于Kafka中的commons.metrics我已经写过一篇文章介绍了,有兴趣的同学可以看这个连接kafka性能监控之KafkaMetrics Sensor

接下来我会介绍一下,具体kafka是如何使用delayQueue来实现Quato功能的,在我刚才给的那个链接里提到了,每一次broker发送response时都会记录下当前consume或者produce了多少字节,如果超出配额就会抛出QuatoVioLationException的异常.

try {  clientSensors.quotaSensor.record(value)  // trigger the callback immediately if quota is not violated  callback(0)} catch {  case qve: QuotaViolationException =>    // Compute the delay    val clientMetric = metrics.metrics().get(clientRateMetricName(clientQuotaEntity.sanitizedUser, clientQuotaEntity.clientId))    throttleTimeMs = throttleTime(clientMetric, getQuotaMetricConfig(clientQuotaEntity.quota))    clientSensors.throttleTimeSensor.record(throttleTimeMs)    // If delayed, add the element to the delayQueue    delayQueue.add(new ThrottledResponse(time, throttleTimeMs, callback))    delayQueueSensor.record()    logger.debug("Quota violated for sensor (%s). Delay time: (%d)".format(clientSensors.quotaSensor.name(), throttleTimeMs))}
上边处理异常的逻辑比较清晰,首先得到这个client的引用clientMetric.接着计算出应该延迟多少秒再发送response.之后放入delayQueue.之后的执行就交由ThrottledRequestReaper来执行,

class ThrottledRequestReaper(delayQueue: DelayQueue[ThrottledResponse]) extends ShutdownableThread(  "ThrottledRequestReaper-%s".format(apiKey), false) {  override def doWork(): Unit = {    val response: ThrottledResponse = delayQueue.poll(1, TimeUnit.SECONDS)    if (response != null) {      // Decrement the size of the delay queue      delayQueueSensor.record(-1)      trace("Response throttled for: " + response.throttleTimeMs + " ms")      response.execute()    }  }}
这个线程就是不断地从delayQueue中拉出到期的任务,到期时间就是根据刚才算得应该延迟多少秒.之后response.execute()执行的就是之前传入的callBack回调.那么延迟多少秒是怎么计算出来的呢?

private def throttleTime(clientMetric: KafkaMetric, config: MetricConfig): Int = {  val rateMetric: Rate = measurableAsRate(clientMetric.metricName(), clientMetric.measurable())  val quota = config.quota()  val difference = clientMetric.value() - quota.bound  // Use the precise window used by the rate calculation  val throttleTimeMs = difference / quota.bound * rateMetric.windowSize(config, time.milliseconds())  throttleTimeMs.round.toInt}
这个throttleTime()函数返回的值就是所要延迟的秒数,如果抛出了QuatoVioLationException的异常就会被调用.为什么要这么算?其实不难理解.

首先说明一下,各个变量的含义.首先这个quota是指标准的配额也就是规定的传速速率上限,clientMetric.value()是指实际测得的速度上限.而

rateMetric.windowSize(config, time.milliseconds())
是指测得clientMetric.value()的值的时间窗口大小.

举个例子就是,现在我规定broker只能以10Mb/s(quota)的速度传输数据,但是实际测得在5S(rateMetric.windowSize())内速度达到了11MB/S(clientMetric.value()),那如何把速度降到10Mb/s呢?那broker的response就得延迟x秒发送.

x = 11*5/10-5 = (11-10)/10*5也就是上面的那个公式.

接下来,来说一下这个Rate类是怎么回事.

public abstract class SampledStat implements MeasurableStat {    private double initialValue;    private int current = 0;    protected List<Sample> samples;    @Override    public void record(MetricConfig config, double value, long timeMs) {        Sample sample = current(timeMs);        if (sample.isComplete(timeMs, config))            sample = advance(config, timeMs);        update(sample, config, value, timeMs);        sample.eventCount += 1;    }
看到这个record函数了吗?每次调用输入一个值,都会在这个List<Sample>得到当前current对应的sample然后更新这个sample.

1.isComplete返回true的条件就是这个sample使用了一定长度的时间或者记录了一定个数个值.

public boolean isComplete(long timeMs, MetricConfig config) {    return timeMs - lastWindowMs >= config.timeWindowMs() || eventCount >= config.eventWindow();}
2.这个advance其实就是把current+1,然后返回相应位置的sample,由此我们可以推测出,在相近的时间内,输入的数据其实是会被累加到一个sample上的

private Sample advance(MetricConfig config, long timeMs) {    this.current = (this.current + 1) % config.samples();    if (this.current >= samples.size()) {        Sample sample = newSample(timeMs);        this.samples.add(sample);        return sample;    } else {        Sample sample = current(timeMs);        sample.reset(timeMs);        return sample;    }}

3.update具体干什么是由子类来实现的.

public static class SampledTotal extends SampledStat {    public SampledTotal() {        super(0.0d);    }    @Override    protected void update(Sample sample, MetricConfig config, double value, long timeMs) {        sample.value += value;    }    @Override    public double combine(List<Sample> samples, MetricConfig config, long now) {        double total = 0.0;        for (int i = 0; i < samples.size(); i++)            total += samples.get(i).value;        return total;    }}
整个实现其实很naive的,就是把所有输入的数据给累加到某一的时间窗口上,然后简单地计算平均值就是rate了.

如果非要说有什么trick的话,就属这个windowSize了.这个sampleList其实可以看作一个个的时间槽,这些时间槽可以循环利用,每次可以只淘汰一部分最旧的数据,并把空出来的槽给最新的数据使用.通过这种方法来实现渐进地更新!用到max,min或者avg上也是同理.好比要测10S内的平均值,现在我开了十个时间槽,每个槽跨度1S,到了第11秒那我只要每次淘汰我0秒到1秒之间所对应的那个槽就可以了.

原创粉丝点击