Disruptor-无锁编程－核心原理剖析－ Volatile的普遍误解

来源：互联网发布：单片机仿真软件proteus 编辑：程序博客网时间：2024/05/21 11:25

提到Disruptor，首先映入大家脑海的词就是“无锁“，“快“，本文将试图从底层核心原理上来剖析Disruptor为什么可以“完全无锁“，为什么可以如此快？

关于Disruptor的背景介绍，本文就不多费口舌了。直接引用网上一段常见的描述：

Martin Fowler在自己网站上写了一篇LMAX架构的文章，在文章中他介绍了LMAX是一种新型零售金融交易平台，

它能够以很低的延迟产生大量交易。这个系统是建立在JVM平台上，其核心是一个业务逻辑处理器，它能够在一个线程里每秒处理6百万订单。

业务逻辑处理器完全是运行在内存中，使用事件源驱动方式。业务逻辑处理器的核心是Disruptor。Disruptor它是一个开源的并发框架，并获得2011 Duke’s 程序框架创新奖，能够在无锁的情况下实现网络的Queue并发操作。

核心原理之1－－单线程写

在前面JUC序列分析中，我们提到了Linux内核的kfifo队列，也是RingBuffer。其关键特征是“一读一写“，因此可以“完全无锁“，连CAS都不需要。

同样，Disruptor的RingBuffer，至所以可以做到完全无锁，也是因为“单线程写“，这是所有“前提的前提“。离了这个前提条件，没有任何技术可以做到完全无锁。

借用Disruptor官方提到的一篇博客文章“Sharing Data Among Threads Without Contention”, http://javap.cn/1447165614671.html，也就是：

single-writer principle

同样，在下面这篇文章中，也对这个原则做了详细阐述：

http://mechanical-sympathy.blogspot.hk/2011/09/single-writer-principle.html

到这里，我们可以下1个结论：只要是“单写“，无论是Linux内核kfifo的“一读一写“，还是Disruptor的“多读1写“，都可以做到完全无锁，CAS都不需要。

核心原理之2 －－内存屏障

除了上面的“单写“这个前提条件，要正确的实现无锁，还需要另外一个关键技术：内存屏障。

对应到Java语言，就是valotile变量与happen before语义。

下面将对内存屏障与Java的volatile变量之间的关系做一个梳理。

内存屏障 – Linux的smp_wmb()/smp_rmb()

内存屏障其实就是1条cpu指令，这条cpu指令有以下2个作用：
（1）阻止指令的重排序。插入1个内存屏障之后，屏障之后的代码，不会被重排序到屏障之前。
（2）flush store缓存/load缓存

具体来讲，有2种内存屏障：
store barrier（写屏障）：刷新store缓存。即把store barrier 之前的写操作，也就是store缓存里面的内容，刷新到主存，从而其它cpu可以看到写的值；
load barrier（读屏障）：失效load缓存。从而使得load barrier之后的读操作，不会读到store缓存里面的旧值，而会直接读到其他cpu更新后的新值。

full barrier: 即同时具有store barrier + load barrier的功能

smp_wmb()

拿Linux的kfifo来举例，https://github.com/opennetworklinux/linux-3.8.13/blob/master/kernel/kfifo.c

其用到了store barrier，代码如下：

//入队(插入数据到Ringbuffer)unsigned int __kfifo_in(struct __kfifo *fifo,        const void *buf, unsigned int len){    unsigned int l;    l = kfifo_unused(fifo);    if (len > l)        len = l;    kfifo_copy_in(fifo, buf, len, fifo->in);    //这这插入的store barrier屏障    fifo->in += len;    return len;}static void kfifo_copy_in(struct __kfifo *fifo, const void *src,        unsigned int len, unsigned int off){    unsigned int size = fifo->mask + 1;    unsigned int esize = fifo->esize;    unsigned int l;    off &= fifo->mask;    if (esize != 1) {        off *= esize;        size *= esize;        len *= esize;    }    l = min(len, size - off);    memcpy(fifo->data + off, src, l);    memcpy(fifo->data, src + l, len - l);    //关键点：插入了一个store barrier。从而保证先插入数据，再更新指针in    smp_wmb();  }//出对（从Ringbuffer取出数据）unsigned int __kfifo_out(struct __kfifo *fifo,        void *buf, unsigned int len){    len = __kfifo_out_peek(fifo, buf, len);    //这这插入的store barrier屏障    fifo->out += len;    return len;}unsigned int __kfifo_out_peek(struct __kfifo *fifo,        void *buf, unsigned int len){    unsigned int l;    l = fifo->in - fifo->out;    if (len > l)        len = l;    kfifo_copy_out(fifo, buf, len, fifo->out);    return len;}static void kfifo_copy_out(struct __kfifo *fifo, void *dst,        unsigned int len, unsigned int off){    unsigned int size = fifo->mask + 1;    unsigned int esize = fifo->esize;    unsigned int l;    off &= fifo->mask;    if (esize != 1) {        off *= esize;        size *= esize;        len *= esize;    }    l = min(len, size - off);    memcpy(dst, fifo->data + off, l);    memcpy(dst + l, fifo->data, len - l);    //关键点：插入了一个store barrier。从而保证先出对，再更新指针out    smp_wmb();}

通过上面代码可以看到，kfifo在修改数据和更新指针（对头，对尾）2者之间，插入了一个store barrier，从而确保了：

（1）更新指针的操作，不会被重排序到修改数据之前
（2）更新指针的时候，store cache被刷新，其他cpu可见。

kfifo的弱一致性

这里有个关键点：在修改in, out指针之后，并没有插入内存屏障。这也就意味着，对in, out的修改，可能短时间内会对其他cpu／线程不可见。

也就是说，数据修改了，但是指针没变。但这不会引发问题：顶多就是读的时候，队列不为空，判断为空了；写的时候，队列没满，但判断成满了。调用者的重试机制解决了这个问题。因此这里是“弱一致性的“，或者说“最终一致性“。

volatile的误解

知道了内存屏障的原理，那它和volatile是什么关系呢？我们知道在Java里面，并不会直接操作内存屏障（当然，JDK7开始，unsafe里面提供了相关函数），而是用的volaile变量。

volatile变量具有happen before语义：对volatile变量的写，happen before后续所有对该变量的读。

而这个语义的实现，底层依靠的就是内存屏障技术。此处引用Martin Flower文章中的原话：

http://mechanical-sympathy.blogspot.hk/2011/07/memory-barriersfences.html

In the Java Memory Model a volatile field has a store barrier before the write, and full barrier after

the write to it, this is paired with and a load barrier inserted after a read of it.

这里面有个非常关键的点：对于volatile变量的写入，会在之前、之后插入2个屏障，写之前，插入store barrier；写之后，插入full barrier。这2个barrier分别有2个作用：

（1）store barrier: 保证volatile变量的写，不会跟之前的操作重排序。具体到上面kfifo的例子中，就是对in/out的写，不会被重排序到修改数据之前
（2）full barrier：保证对in/out的修改，立马对其他cpu/线程可见。因此这里就不是“弱一致性了“，而是强一致性。

JSR-133对volatile变量语义的修正

而这正是JSR-133对volatile语义的增强，引用原文：https://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html

What was wrong with the old memory model?The old memory model allowed for volatile writes to be reordered with nonvolatile reads and writes, which was not consistent with most developers intuitions about volatile and therefore caused confusion.

总结一下，volatile变量有2个功效：

（1）内存可见性，这个以前就说过
（2）禁止重排序。也就是JSR-133所增强的。

误解

关于volatile变量和内存屏障的关系，Disruptor官网引用的一篇博客文章里面，有这样的说法：
Dissecting the Disruptor: Demystifying Memory Barriers
http://mechanitis.blogspot.hk/2011/08/dissecting-disruptor-why-its-so-fast.html

If your field is volatile, the Java Memory Model inserts a write barrier instruction after you write to it, and a read barrier instruction before you read from it.

Memory Barriers/Fences

http://ju.outofmemory.cn/entry/16977

In the Java Memory Model a volatile field has a store barrier inserted after a write to it and a load barrier

inserted before a read of it.

笔者认为此处的误解非常普遍，有兴趣的同学可以和我讨论。

核心原理之3 －－伪共享与缓存行填充

缓存行填充并不是只有Disruptor才用，在前面“JUC包源码分析16 – Exchanger源码分析“这篇中，我们已经分析了此技术，此处再讲解一下。

“缓存行填充“，简单讲就是不要让2个变量分配到1个cache line上面，这样会造成1个变量修改，整个cache line失效，另一个变量的缓存也失效。

假如有两个变量x, y，一个线程修改x，另一个线程读y，看起来是相互独立的。但如果x, y处在同一个cache line里面，这会导致一个thread对x的修改，影响另一个thread对y的读取性能，也就是2个线程对同1个cache line发生竞争，这也称之为“伪共享“。

典型的，比如1个链表有头部、尾部2个指针，如果这2个指针分配到了同一个cache line上面，当你不断修改头部指针的时候，尾部指针的缓存也受影响。

通常来讲，cache line是64Byte，比如我想让2个long型变量不要分配到同1个cache line上，就可以为其中每个变量填充7个long型。在Disruptor中，就有类似代码：

abstract class RingBufferPad{    protected long p1, p2, p3, p4, p5, p6, p7;}abstract class RingBufferFields<E> extends RingBufferPad{..}

http://blog.csdn.net/chunlongyu/article/details/53304524

abstractclassRingBufferPad{protectedlong p1, p2, p3, p4, p5, p6, p7;}abstractclassRingBufferFields<E>extendsRingBufferPad{..}

阅读全文

0 0