A Look Inside JBoss Cache

来源：互联网发布：windows qt开发环境编辑：程序博客网时间：2024/04/18 14:08

So I was asked a number of times as to why I didn’t put in anyshameless plugs for JBoss Cache - the project I lead - when I wrote mylast article at DZone on distributed caching and parallelism, and here is why.

Thisfollow-up article focuses entirely on the brand-spanking-new JBossCache 3.0.0 - code-named Naga, and goes in depth to discuss how ourhigh-performance concurrency model works.

Life before Naga

BeforeI start talking about Naga specifically, I’d like to delve into a briefhistory of JBoss Cache. Around 2002, the JBoss Application Server(JBoss AS) needed a clustered cache solution (see my previous articlefor definition of terms) for its HTTP and EJB session statereplication, to maintain high availability in a cluster. JGroups- an open source group communication suite - shipped with a demoreplicated hash map. Bela Ban, the founder and current maintainer ofJGroups, expanded it to accommodate a tree structure - where data isorganized into nodes into a tree data structure - and othercache-relevant features such as eviction and JTA transactions. Aroundearly 2003, this was moved into JBoss AS’s CVS repository and became apart of the JBoss AS code base.

Around March 2005, JBoss Cachewas extracted from the JBoss AS’s repository and became it’s ownstandalone project. The rest, as they say, became history. Featuressuch as cache loading and various cache loader implementations,eviction policies, and buddy replication were gradually added.TCP-based delegating cache servers allowed you to build tiers ofcaches. Custom marshalling frameworks provided a highly performantalternative to Java serialization when replicating state. Along theway, the cache had through one more major release - 2.0 - whichinvolved a major API change and baselining on Java 5. Two othereditions - a POJO edition and Searchable edition - have evolved as well, building on the core cache to provide specific features.

Handing over the torch

Now,it is time for the 2.x series to hand over the torch to 3.0.0. Naga, asit is called internally, is well deserving of a major version change.In addition to evolutionary changes and improvements in the code base -better resource management, marshalling, overall performanceenhancements, and a brand new and much simplified configuration fileformat - it also contains at least one revolutionary change:

MVCC has landed

Multi-versioned concurrency control - MVCC - was adopted as the default concurrency scheme in the cache.

Whenrun in local mode, the most costly part of the cache in terms of memoryand CPU cycles is the locking involved in maintaining integrity ofshared data. In a clustered mode, this locking is the second mostexpensive thing, after RPC calls made by the cache instance to remoteinstances.

Legacy locking schemes

In JBoss Cache1.x and 2.x, we offered two different locking schemes - an optimisticone and a pessimistic one. Each had their pros and cons, but in the endthey were both still costly in terms of performance.

Thepessimistic scheme used a lock per tree node. Reader threads wouldobtain non-exclusive read locks and writer threads would obtainexclusive write locks on these nodes before performing any operations.The locks we used were a custom extension of JDKReentrantReadWriteLocks, modified to support lock upgrading wherewithin the scope of a transaction, a thread may start off reading anode and then later attempt to write to it.

Overall, thisscheme was simple and robust, but didn’t perform too well due to thememory overhead of maintaining a lock per node. More importantly, therewas the reduced concurrency since the existence of read locks preventedwrite locks from being obtained. The effect of readers blocking writersalso introduced the possibility of deadlocks. Take, for example,transaction A which performs a read on node /X and a write on node /Y before committing. Transaction B, which performs a read on node /Y and a write on node /Xbefore committing, starts at the same time. With some unfortunatetiming, we could end up with a situation where transaction A has a readlock on /X and and is waiting for the write lock on /Y. Transaction B has a read lock on /Y and is waiting on a write lock on /X.Both transactions would deadlock, until one of them times out and rollsback. And typically, once one transaction has timed out, chances are sowould the other since they would both have been waiting for almost thesame amount of time.

To overcome the deadlock potential, weoffered an optimistic locking scheme. Optimistic locking used dataversioning on each node. It copied any nodes accessed into atransaction workspace, and allowed transactions to work off the copy.Nodes copied for reading provided repeatable read semantics while nodescopied for writing allowed writer threads to proceed regardless ofsimultaneous readers. Modified nodes were then merged back to the maintree at transaction commit time, subject to version checking to ensureno concurrent writes took place.

Optimistic locking offered amuch higher degree of concurrency with concurrent readers and writers,and removed the risk of deadlock. But it had two main drawbacks. One isperformance, since the constant copying of state for each concurrentthread increased memory footprint significantly, and was also costly interms of CPU cycles. The other is that concurrent writers could exist,but one would inevitably fail at commit time when a data version checkfails. So this meant that writer transactions could happily go aheadand do a lot of costly processing and writing, but only fail at thevery end when attempting to commit.

So how does MVCC help?

MVCCoffers non-blocking readers where readers do not block writer threads,providing a high degree of concurrency as well as removing the risk ofdeadlock. It also is fail-fast, in that writers work sequentially anddon’t overlap, and if they do time out in acquiring a write lock, ithappens very early on in the transaction, when a write occurs ratherthan when a transaction commits. Finally, MVCC is also memory efficientin that it only maintains 1 copy of state for all readers, and 1version being modified for the single, sequential writer. Even better,our implementation of MVCC uses no locks at all for reader threads(very significant for a read-heavy system like a cache), and a customexclusive lock implementation for writers. This custom lock iscompletely free of synchronized blocks and uses modern techniques likecompare-and-swap and memory fencing using volatile variables to achievesynchronization. All this leads to a very highly performant andscalable locking scheme.

Let’s look at some details here.

Theextremely high performance of JBoss Cache's MVCC implementation forreading threads is achieved by not requiring any synchronization orlocking for readers. For each reader thread, the cache wraps state in alightweight container object, which is placed in a container local tothe thread (a ThreadLocal) or an ongoing transaction. All subsequentoperations made on the cache with regards to this state happens viathis container object. This use of Java references allows forrepeatable read semantics even if the actual state changes concurrently.

Writerthreads, on the other hand, need to acquire a lock before any writingcan commence. We now use lock striping to improve the memoryperformance of the cache, and the size of the shared lock pool can betuned using the concurrencyLevel attribute of the locking element. (Seethe JBoss Cache configuration referencefor details ). After acquiring an exclusive lock on a node, the writerthread then wraps the state to be modified in a container as well, justlike with reader threads, and then copies this state for writing. Whencopying, a reference to the original version is still maintained in thecontainer for rollbacks. Changes are then made to the copy and the copyis finally written to the data structure when the write completes.

Thisway, subsequent readers see the new version while existing readersstill hold a reference to the original version in their context,achieving repeatable read semantics.

If a writer is unable to acquire the write lock after some time, a TimeoutException is thrown.

AlthoughMVCC forces writers to obtain a write lock, a phenomenon known as writeskews may occur when using repeatable read as your isolation level.This happens when concurrent transactions perform a read and then awrite, based on the value that was read. Since reads involve holding onto the reference to the state in the transaction context, a subsequentwrite would work off that original state read, which may now be stale.

Thedefault behavior with dealing with a write skew is to throw aDataVersioningException, when it is detected when copying state forwriting. However, in most applications, a write skew may not be anissue (for example, if the state written has no relationship to thestate originally read) and should be allowed. If your application doesnot care about write skews, you can allow them to happen by setting thewriteSkewCheck configuration attribute to false. See the JBoss Cacheconfiguration reference for details.

Notethat write skews cannot happen when using READ_COMMITTED since threadsalways work off committed state. Write skews are also witnessed inoptimistic locking, manifested as a DataVersioningException, exceptthat this can happen with any isolation level when using optimisticlocking.

Is there a tutorial on this stuff?

Of course. All you need to do is download the jbosscache-core-all.zip distributionof JBoss Cache. A tutorial is bundled in the distribution, completewith a GUI to visualize what’s going on in your cache as you do stuff.Alternately, there is also a GUI demo to demonstrate cache capabilities.

Nice - so where do I get it?

Soto sum things up, Naga is the latest and greatest of what JBoss Cachehas to offer, with significantly faster access for both readers andwriters, much better stability and predictability in performance,faster replication, lower memory footprint, makes you coffee, and walksyour dog. Download Naga here [6]. A users’ guide, FAQ and tutorial is also available here.

[http://java.dzone.com/articles/a-look-inside-jboss-cache]

作者 Manik Surtani 是开源项目 JBoss Cache 的领导人，本文主要是对 JBoss Cache3.0 —— Naga 一些全新技术的论述，其中也不乏对原有技术的回顾。总的来说，本文还是揭示了缓存的未来—— MVCC ，值得推荐。

正文

当我在 DZone 写完《分布式，缓存与并行》一文后，许人多次我为什么这么“厚着脸皮”的力挺我领导的 JBoss Cache 开源项目时，我想本文会给你们一个满意的答案。这篇文章将完全致力于焕然一新的 JBoss Cache3.0 （代号 Naga ），并且深入讨论我们的高性能并发模型。

Naga 的今生前世

在开始讨论 Naga 之前，我还是想简要的说说 JBoss Cache 的历史。大概在 2002 的时候，为了维持集群的高可用性，JBoss AS （ JBoss 应用服务器）需要提供一个专门为解决 HTTP 和 EJB Session 状态复制的集群缓存方案。JGroups 是一款开源的成组通信 (group communication ) 项目。 Bela Ban 是 JGroup的创始人也是维护开发人员，对 JGroup 进行扩展，使其适应树形数据结构，并且还增加了一个缓存相关的特性： eviction 和 JTATransactions 。大约在 2003 年年初时，这个被扩展的树型结构迁移到了 JBoss AS CVS 的 repository中，从此成为 Jboss AS 中的一员。

时间齿轮又指向了 2005 年 3 月， JBoss Cache 从 JBoss AS 的 repository中分离出来，单独形成一个项目。唉，那都是陈年旧事了。不过，像 cache loading( 缓存负载 ) ，多种缓存装载器的实现，eviction 策略， buddy replication(buddy 复制 ) 都是后来慢慢加入的。基于 TCP的委托缓存服务器允许你构建多层缓存。当进行状态提制时， Custom marshalling 框架为比 Java序列化机制提供了更高的性能。紧接着又迎来了 JBoss Cache2.0 的发布，这次的 API 改动很大，并且要求基于 Java5。另外两个基于此核心缓存技术的 POJO版本和 Searchable版本也发展良好。

火炬交接

现在，是将 Jboss Cache2.* 系列的火炬转交给 Naga 了。这次 Naga 又有了很大的变化和改进，除了资源管理和 marshalling 的全面提升，以及全新的简化配置文件格式外，它还包括至少一个革命性的改变： MVCC 。

MVCC 时代已经到来

MVCC 全称 Multi-versioned concurrency control( 多版本并发控制 ) ，在 Naga 中已经被采纳作为默认的并发解决方案。

当以本地方式运行时，对内存和 CPU 而言，缓存最大的开销就是使用锁来在保证共享数据完整性。而到了集群环境中，锁成了继 RPC 调用后的第二大开销“大户”。

对遗留的锁机制回顾

在 JBoss Cache1.* 和 2.* 时代，我们提供两种不同的锁方案——即乐观锁和悲观锁。它们各有千秋，但是从性能角度上来说，它们还是开销太大。

悲观锁用来锁住树中的每个结点。 Reader threads （读线程）可以得到一个非独占的 read locks( 读锁 ) ，而writer threads( 写线程 ) 却可得到一个独占的 write locks( 写锁 ) ，从而独占这些结点。我们实现的锁是通过扩展JDK 的 ReentrantReadWriteLocks ，将其改进成为支持事务作用域内的锁更新——即一个线程可以开始时用 readlocks 去读取一个结点，稍后再尝试用 write locks 着去更新它。（注意，悲观锁的读写是互斥的，无法同时进行的）

总得来说，这种方案简单而且健壮，但由于内存要维护每个被锁的结点，所以从性能上说还不是很满意。更重要的是，如果结点已经被 readlocks 锁住了，那么 write locks就没办法再去操作它们了，使得并发性能下降。读操作阻塞写操作的后果还容易造成死锁。好吧，我们现在来看个例子：

“事务 A 提交前，对结点 /X 执行读操作，对结点 /Y 执行写操作。事务 B 恰恰与之相反，在提交前，对结点 /Y执行读操作，对结点 /X 执行写操作。不幸的事发生了，事务 A 对结点 /X 用了 read lock ，并且还在等待时机去用 writelock 操作 /Y 结点；而事务 B 对结点 /Y 用了 read lock ，也还在等待时机去用 write lock 操作 /X结点。这两个事务发生死锁了，直到其中的一方超时，然后事务回滚。”

为了克服潜在的死锁问题，我们提供了乐观锁。乐观锁对每个结点采用版本控制方式。它允许任意多个结点拷贝 (Nodes copied)出现在一个事务中，并允许事务处理这些拷贝。结点拷贝为读操作提供了可重复读取的语义，同时还允许 writer threads在不考虑读操作的情况下，进行相应的写操作。那些修改的结点会在事务提交时进行版本检查，确保没有新的并发写操作发生，最后将结点合并到缓存的main tree 上去。

乐观锁提供了更高级别的并发机制来处理并发读写操作，而且还避免了死锁的风险。但它仍然有两个主要的缺点：一是性能问题。因为不断的将结点的状态拷贝到每个并发线程所造成的内存和 CPU开销是不容忽略的。二是尽管并发时允许了写操作，但是一旦发现数据的版本不对，事务提交时不可避免的还是会失败。也就是说，此时写事务虽然可以不受限制的进行大量处理和写操作，但是这样在事务结束的时候容易出现提交失败。

MVCC 有什么用

MVCC 提供了非阻塞 (non-blocking) 读操作 ( 它并不会去阻塞 wirter threads)，在避免死锁的同时也提供了更高级的并发机制。它采用了 fail-fast 机制，如果写操作得到了一个 write lock，那么它们也是依次进行，不允许重叠。最后我要说的是， MVCC在内存使用率上也是可圈可点：它对所有的读操作只维护一个状态的拷贝；对依次顺序进行的写操作来说，每次的修改只会对版本号产生一次变化。更棒的是，我们的 MVCC 实现甚至可以对 reader threads 完全不采用任何锁 ( 对于像缓存这样频繁读取的系统来说，意义太大了 )，并且还允许自定义的为写操作实现独占锁。自定义锁完全摒弃了同步代码块，使用了最新的并发技术： compare-and-swap 和memory fencing( 使用 volatile variables 实现同步 ) 。所有的这一切都会让 MVCC在性能和可伸缩性方面，成为一个更加出色的解决方案。

说了这么多，是该谈 MVCC 的细节了。

JBoss Cache 的 MVCC 实现这所以这么高效在于 reading threads之间不需要任何同步代码块或锁机制。对于每个 reader thread 来说，缓存将结点的状态包装在一个轻量级的容器对象（比如说ThreadLocal）或者长事务中。所有的后继操作要想访问或操作缓存中的结点，都必须通过这个容器对象。甚至当结点的状态真的在并发时发生了变化，那么使用 Java引用的使用也可以达到可重复读取的语义。（下文有具体的说明）

从另一方面来看， write threads 首先需要获得一个锁后，才可执行写操作。现在，我们的做法是使用 lock striping（分离锁）来提升缓存的内存性能，而 shared lock pool( 共享锁池 ) 级别可以使用被锁定结点的concurrencyLevel 属性来进行调整 ( 更多细节，请看 Jboss Cache的配置参考) 。在获得一个独占锁后，如同 reader threads 那样， writer thread也会将要修改结点的状态包装在一个容器中，然后将它的状态拷贝出来，再进行写操作。注意，在拷贝状态的时候，指向原始结点的引用仍然是可以进行回滚操作的。当写操作即将完成时， writer thread最终又将已经发生改变的拷贝的状态写回相应的数据结构中（比如说文件系统，数据库等，但是始终不会影响到在容器中的原始结点，感觉与 oracle机制有点像），最后操作完成。

这样的话，假如一些现有的 reader thread 再次读取该结点时，发现其版本号已经更新了，它仍然会持有指向原来结点的引用，从而实现可重复读取的语义。

如果写操作在等待一定时间后，仍然无法获取 write lock 的话，一个 TimoutException 立即抛出。

尽管 MVCC 已经强行要求写操作必须先获得一个 write lock ，但是众所周知，即使是使用可重复读这一隔离级别，由于write skew( 写偏斜 )所造成的幻读仍然有可能发生。当并发事务进行读写操作时，由于读操作会一直在事务上下文中持有结点的原始引用，那么就算接下的写操作就算已经将该结点给处理掉了，对于读操作来说也是透明的，那么幻读就发生了。

在拷贝结点状态准备进行写操作时，如果检测到 write skew ，那么默认的处理方式就是抛出一个DataVersioningException 异常。尽管如此，对大多数并不苛刻的应用程序来说， write skew也许不是什么问题而且出现这样的情况也是允许的。如果你的应用程序并不关心 write skew ，你可以将 writeSkewChecks属性设置为 false ，完全不予理睬。看看文档，里面有关于 Jboss Cache 的配置细节。

需要注意的是，如果设置了 READ_COMMITTED 隔离级别时，线程总是会处理已经提交的结点，那么 write skews就可以避免发生；当使用乐观锁时，无论使用何种隔离级别， write skews 都有抛出 DataVersioningException异常的可能。

有什么可参考的 tutorial 吗？

当然。你们只要下载 JBoss Cache 的 bosscache-core-all.zip 分发包，里面就有图文并茂的 tutorial 来帮你实现自己的缓存。这里还有一个 GUI的 deom来更好的说明缓存。

[http://konghaibo.javaeye.com/blog/308490]