Disruptor Ring Buffer as a Blocking Queue

来源：互联网发布：淘宝职业退货师被抓编辑：程序博客网时间：2024/05/02 04:22

Author:Wang, Xinglang

Abstract

For any concurrent multi-threaded system, distributed computing or otherwise,the inter-thread messaging component is an very important component. In Java, the JDK provided
ArrayBlockingQueue, LinkedBlockingQueue, TransferQueue. And Disruptor (http://lmaxexchange.github.io/disruptor/)
is very famous based on its high performance on its inter-thread messaging, but it does not expose as a BlockingQueue. This blog will introduce a new Blocking Queue based on its ring buffer and also with a benchmark result.

Why require Blocking Queue interface

Blocking queue interface is widely used by existed code, changing to Disruptor directly will cause big changes since disruptor want to control the whole thread scheduling. Second, Disruptor only call back when there is an event arrived, but it does not have a chance to let the application control the behavior when the queue is built-up and do some pro-active throttling.This blog will introduce a BlockingQueue implementation on top of RingBuffer, but there is a limitation,this queue can only be consumed by one consumer thread, but for producer, it can be single or multiple producer thread. This will be useful for the Actor Pattern, which use a blocking queue and one thread to drain queue. The reason is the offset of the consumer side can be hard to maintain if there are multiple consumer threads, multiple thread consumers should use Disruptor WorkerPool to replace the JDK Executor.

Implementation

The source code is available on
Github:https://github.com/xinglang/disruptorqueue/tree/master/disruptorqueue
Since this queue only supports one consumer, so let's call it SingleConsumerDisruptorQueue
The SingleConsumerDisruptorQueue will have a ring buffer and a sequence (consumedSeq) for the
cosnumer, the cosnumedSeq will be the gating sequence of the ring buffer. And there a knownPublishedSeq which used to remember the last known published sequence. Since it will be a
blocking queue, so the wait strategy will be BlockingWaitStrategy (Default one).

private final RingBuffer<Event<T>> ringBuffer;private final Sequence consumedSeq;private final SequenceBarrier barrier;private long knownPublishedSeq;public SingleConsumerDisruptorQueue(int bufferSize, boolean singleProducer) {if (singleProducer) {ringBuffer = RingBuffer.createSingleProducer(new Factory<T>(),normalizeBufferSize(bufferSize));} else {ringBuffer = RingBuffer.createMultiProducer(new Factory<T>(),normalizeBufferSize(bufferSize));}consumedSeq = new Sequence();ringBuffer.addGatingSequences(consumedSeq);barrier = ringBuffer.newBarrier();long cursor = ringBuffer.getCursor();consumedSeq.set(cursor);knownPublishedSeq = cursor;}

For the publish, just use ring buffer publish. And inside the ring buffer, there is a event holder which
acts as a value holder of the item.

@Overridepublic boolean offer(T e) {long seq;try {seq = ringBuffer.tryNext();} catch (InsufficientCapacityException e1) {return false;}publish(e, seq);return true;}private void publish(T e, long seq) {Event<T> holder = ringBuffer.get(seq);holder.setValue(e);ringBuffer.publish(seq);}

For the consume, there is a optimization since only one consumer thread. Each time when call the waitFor, it can get the last known published sequence, if the consumer sequence less than the last known published sequence, it does not need call the barrier waitFor method.

@Overridepublic T take() throws InterruptedException {long l = consumedSeq.get() + 1;while (knownPublishedSeq < l) {try {knownPublishedSeq = barrier.waitFor(l);} catch (AlertException e) {throw new IllegalStateException(e);} catch (TimeoutException e) {throw new IllegalStateException(e);}}Event<T> eventHolder = ringBuffer.get(l);consumedSeq.incrementAndGet();return eventHolder.getValue();}

Performace analysis

First of all, it can get all benefits from the ring buffer design:

Avoid false sharing
Pre-allocated ring buffer, no any instance created during publish/consume
Less context switch, the consumer can get a batch of events without interrupted

Below is a benchmark for the queue and LinkedBlockingQueue, ArrayBlockingQueue and Transfer Queue. The Benchmark run on a baremetal machine with Ubuntu, the benchmark use 1 consumer thread, and 1 to 4 producer thread, each round run 32M put/take, the object for put is a constant string, so there is no any GC overhead for the object creation.

Single Producer benchmark

$ perf stat java -jar disruptortest.jar type=dbq                          Producers :1, buffer size: 262144, batch:0                                SingleConsumerDisruptorQueue transfer rate : 19890 per ms, Used 1687ms for 33554432                                                                  Performance counter stats for 'java -jar disruptortest.jar type=dbq':     3729.421847 task-clock # 1.998 CPUs utilized   1,891 context-switches # 0.001 M/sec                                 76 CPU-migrations # 0.000 M/sec                            9,357 page-faults # 0.003 M/sec      9,434,280,791 cycles # 2.530 GHz [83.38%]  5,489,619,603 stalled-cycles-frontend # 58.19% frontend cycles idle [83.35%] 2,618,037,087 stalled-cycles-backend # 27.75% backend cycles idle [66.99%] 10,797,968,145 instructions # 1.14 insns per cycle                                             # 0.51 stalled cycles per insn [83.55%]1,742,973,721 branches # 467.358 M/sec [83.28%]      10,213,770 branch-misses # 0.59% of all branches [83.12%]1.866803438 seconds time elapsed               $ perf stat java -jar disruptortest.jar type=abq                                 Producers :1, buffer size: 262144, batch:0                                      ArrayBlockingQueue transfer rate : 2694 per ms, Used 12451ms for 33554432    Performance counter stats for 'java -jar disruptortest.jar type=abq':22976.952946 task-clock # 1.824 CPUs utilized  232,766 context-switches # 0.010 M/sec           80 CPU-migrations # 0.000 M/sec    68,531 page-faults # 0.003 M/sec     58,643,663,103 cycles # 2.552 GHz [83.14%] 51,767,105,241 stalled-cycles-frontend # 88.27% frontend cycles idle [83.32%]47,084,355,024 stalled-cycles-backend # 80.29% backend cycles idle [66.51%]   12,035,035,540 instructions # 0.21 insns per cycle                                                # 4.30 stalled cycles per insn [83.44%] 2,016,738,256 branches # 87.772 M/sec [83.56%]        20,147,764 branch-misses # 1.00% of all branches [83.49%]12.596555382 seconds time elapsed                                         $ perf stat java -jar disruptortest.jar type=lbq                                  Producers :1, buffer size: 262144, batch:0                                        LinkedBlockingQueue transfer rate : 1132 per ms, Used 29632ms for 33554432          Performance counter stats for 'java -jar disruptortest.jar type=lbq':             58707.942294 task-clock # 1.968 CPUs utilized 82,377 context-switches # 0.001 M/sec         97 CPU-migrations # 0.000 M/sec   133,543 page-faults # 0.002 M/sec     151,825,969,348 cycles # 2.586 GHz [83.27%] 139,833,905,165 stalled-cycles-frontend # 92.10% frontend cycles idle [83.40%]131,712,244,095 stalled-cycles-backend # 86.75% backend cycles idle [66.67%]10,997,843,405 instructions # 0.07 insns per cycle                                              # 12.71 stalled cycles per insn [83.26%]  1,701,879,665 branches # 28.989 M/sec [83.31%]         23,369,660 branch-misses # 1.37% of all branches [83.35%]29.830928757 seconds time elapsed                                            $ perf stat java -jar disruptortest.jar type=tq                                      Producers :1, buffer size: 262144, batch:0                                       LinkedTransferQueue transfer rate : 2139 per ms, Used 15685ms for 33554432       Performance counter stats for 'java -jar disruptortest.jar type=tq':             107428.492713 task-clock # 6.737 CPUs utilized10,542 context-switches # 0.000 M/sec         100 CPU-migrations # 0.000 M/sec    245,909 page-faults # 0.002 M/sec     278,182,169,187 cycles # 2.589 GHz [83.33%] 204,478,913,414 stalled-cycles-frontend # 73.51% frontend cycles idle [83.36%]164,497,727,638 stalled-cycles-backend # 59.13% backend cycles idle [66.73%]90,952,113,104 instructions # 0.33 insns per cycle                                             # 2.25 stalled cycles per insn [83.37%]  32,522,385,525 branches # 302.735 M/sec [83.30%]             57,227,684 branch-misses # 0.18% of all branches [83.28%]15.947024802 seconds time elapsed

Multiple Producer benchmark

$ perf stat java -jar disruptortest.jar type=dq producer=4                        Producers :4, buffer size: 262144, batch:0                                      SingleConsumerDisruptorQueue transfer rate : 2859 per ms, Used 46941m for                                           134217728                                                                        Performance counter stats for 'java -jar disruptortest.jar type=dq producer=4':                    118905.839793 task-clock # 2.523 CPUs utilized                          2,172,912 context-switches # 0.018 M/sec            280 CPU-migrations # 0.000 M/sec    28,697 page-faults # 0.000 M/sec     141,597,737,150 cycles # 1.191 GHz [83.18%]  113,618,387,640 stalled-cycles-frontend # 80.24% frontend cycles idle [83.42%]  96,562,209,060 stalled-cycles-backend # 68.19% backend cycles idle [66.86%] 55,227,379,587 instructions # 0.39 insns per cycle                                             # 2.06 stalled cycles per insn [83.45%]  9,312,400,407 branches # 78.317 M/sec [83.19%]         64,375,263 branch-misses # 0.69% of all branches [83.35%]47.133747893 seconds time elapsed                                          $ perf stat java -jar disruptortest.jar type=abq producer=4                   Producers :4, buffer size: 262144, batch:0                                ArrayBlockingQueue transfer rate : 2047 per ms, Used 65546ms for 134217728Performance counter stats for 'java -jar disruptortest.jar type=abq producer=4':Multiple Producer benchmark79345.046656 task-clock # 1.208 CPUs utilized                 3,003,905 context-switches # 0.038 M/sec              594 CPU-migrations # 0.000 M/sec      77,227 page-faults # 0.001 M/sec     102,931,605,765 cycles # 1.297 GHz [83.10%]  78,913,722,891 stalled-cycles-frontend # 76.67% frontend cycles idle [83.46%]65,701,179,927 stalled-cycles-backend # 63.83% backend cycles idle [66.99%]52,891,419,177 instructions # 0.51 insns per cycle                                             # 1.49 stalled cycles per insn [83.41%]  9,307,141,741 branches # 117.300 M/sec [83.21%]        79,855,221 branch-misses # 0.86% of all branches [83.23%]65.694123910 seconds time elapsed                                            $ perf stat java -jar disruptortest.jar type=lbq producer=4                     Producers :4, buffer size: 262144, batch:0                                  LinkedBlockingQueue transfer rate : 2795 per ms, Used 48014ms for 134217728     Performance counter stats for 'java -jar disruptortest.jar type=lbq producer=4':110080.375452 task-clock # 2.284 CPUs utilized  3,644,802 context-switches # 0.033 M/sec            597 CPU-migrations # 0.000 M/sec    136,440 page-faults # 0.001 M/sec     185,250,018,068 cycles # 1.683 GHz [83.46%] 144,448,559,949 stalled-cycles-frontend # 77.97% frontend cycles idle [83.62%]118,250,468,418 stalled-cycles-backend # 63.83% backend cycles idle [66.28%]73,113,563,433 instructions # 0.39 insns per cycle                                             # 1.98 stalled cycles per insn [83.21%]  12,028,209,235 branches # 109.268 M/sec [83.25%]        129,234,077 branch-misses # 1.07% of all branches [83.40%]48.189813503 seconds time elapsed                                        $ perf stat java -jar disruptortest.jar type=tq producer=4                 Producers :4, buffer size: 262144, batch:0                                 LinkedTransferQueue transfer rate : 1438 per ms, Used 93273ms for 134217728Performance counter stats for 'java -jar disruptortest.jar type=tq producer=4':761878.416668 task-clock # 8.122 CPUs utilized71,371 context-switches # 0.000 M/sec       203 CPU-migrations # 0.000 M/sec  670,788 page-faults # 0.001 M/sec   1,976,200,012,808 cycles # 2.594 GHz [83.33%] 1,584,264,715,610 stalled-cycles-frontend # 80.17% frontend cycles idle [83.34%]1,368,861,011,899 stalled-cycles-backend # 69.27% backend cycles idle [66.68%]487,816,405,509 instructions # 0.25 insns per cycle                                              # 3.25 stalled cycles per insn [83.34%]   169,135,278,863 branches # 221.998 M/sec [83.33%]          615,658,238 branch-misses # 0.36% of all branches [83.33%]93.798977802 seconds time elapsed

Conclusion

Using RingBuffer of disruptor to create a blocking queue is possible. For single producer/consumer case, it can be 5x faster than JDK default blocking queue implementation. In multiple producer case, it is much faster than arrayblocking queue and transfer queue, the linked blocking queue can achieve similar throughput but disruptor one has less context switches and less memory footprint. The only limitation is it only support the single consumer thread. The benefits for the BlockingQueue implementation on top of RingBuffer is it can be just a replacement for the existed code, and it give user more control via the BlockingQueue interface, the WorkerPool provided by disruptor only allow user to give a event handler for callback.

0 0