Understanding Weak References

来源:互联网 发布:立体图纸设计软件 编辑:程序博客网 时间:2024/05/19 14:54

Understanding Weak References

Posted by enicholas on May 4, 2006 at 5:06 PM PDT

Some time ago I was interviewing candidates for a Senior Java Engineer position. Among the many questions I asked was "What can you tell me about weak references?" I wasn't expecting a detailed technical treatise on the subject. I would probably have been satisfied with "Umm... don't they have something to do with garbage collection?" I was instead surprised to find that out of twenty-odd engineers, all of whom had at least five years of Java experience and good qualifications, only two of them even knew that weak references existed, and only one of those two had actual useful knowledge about them. I even explained a bit about them, to see if I got an "Oh yeah" from anybody -- nope. I'm not sure why this knowledge is (evidently) uncommon, as weak references are a massively useful feature which have been around since Java 1.2 was released, over seven years ago.

Now, I'm not suggesting you need to be a weak reference expert to qualify as a decent Java engineer. But I humbly submit that you should at least know what they are -- otherwise how will you know when you should be using them? Since they seem to be a little-known feature, here is a brief overview of what weak references are, how to use them, and when to use them.

Strong references

First I need to start with a refresher on strong references. A strong reference is an ordinary Java reference, the kind you use every day. For example, the code:

<pre>StringBuffer buffer = new StringBuffer();</pre>

creates a new StringBuffer() and stores a strong reference to it in the variable buffer. Yes, yes, this is kiddie stuff, but bear with me. The important part about strong references -- the part that makes them "strong" -- is how they interact with the garbage collector. Specifically, if an object is reachable via a chain of strong references (strongly reachable), it is not eligible for garbage collection. As you don't want the garbage collector destroying objects you're working on, this is normally exactly what you want.

When strong references are too strong

It's not uncommon for an application to use classes that it can't reasonably extend. The class might simply be marked final, or it could be something more complicated, such as an interface returned by a factory method backed by an unknown (and possibly even unknowable) number of concrete implementations. Suppose you have to use a class Widget and, for whatever reason, it isn't possible or practical to extend Widget to add new functionality.

What happens when you need to keep track of extra information about the object? In this case, suppose we find ourselves needing to keep track of eachWidget's serial number, but the Widget class doesn't actually have a serial number property -- and because Widget isn't extensible, we can't add one. No problem at all, that's what HashMaps are for:

<pre>serialNumberMap.put(widget, widgetSerialNumber);</pre>

This might look okay on the surface, but the strong reference to widget will almost certainly cause problems. We have to know (with 100% certainty) when a particular Widget's serial number is no longer needed, so we can remove its entry from the map. Otherwise we're going to have a memory leak (if we don't remove Widgets when we should) or we're going to inexplicably find ourselves missing serial numbers (if we remove Widgets that we're still using). If these problems sound familiar, they should: they are exactly the problems that users of non-garbage-collected languages face when trying to manage memory, and we're not supposed to have to worry about this in a more civilized language like Java.

Another common problem with strong references is caching, particular with very large structures like images. Suppose you have an application which has to work with user-supplied images, like the web site design tool I work on. Naturally you want to cache these images, because loading them from disk is very expensive and you want to avoid the possibility of having two copies of the (potentially gigantic) image in memory at once.

Because an image cache is supposed to prevent us from reloading images when we don't absolutely need to, you will quickly realize that the cache should always contain a reference to any image which is already in memory. With ordinary strong references, though, that reference itself will force the image to remain in memory, which requires you (just as above) to somehow determine when the image is no longer needed in memory and remove it from the cache, so that it becomes eligible for garbage collection. Once again you are forced to duplicate the behavior of the garbage collector and manually determine whether or not an object should be in memory.

Weak references

weak reference, simply put, is a reference that isn't strong enough to force an object to remain in memory. Weak references allow you to leverage the garbage collector's ability to determine reachability for you, so you don't have to do it yourself. You create a weak reference like this:

<pre>WeakReference&lt;Widget&gt; weakWidget = new WeakReference&lt;Widget&gt;(widget);</pre>

and then elsewhere in the code you can use weakWidget.get() to get the actual Widget object. Of course the weak reference isn't strong enough to prevent garbage collection, so you may find (if there are no strong references to the widget) that weakWidget.get() suddenly starts returning null.

To solve the "widget serial number" problem above, the easiest thing to do is use the built-in WeakHashMap class. WeakHashMap works exactly like HashMap, except that the keys (not the values!) are referred to using weak references. If a WeakHashMap key becomes garbage, its entry is removed automatically. This avoids the pitfalls I described and requires no changes other than the switch from HashMap to a WeakHashMap. If you're following the standard convention of referring to your maps via the Map interface, no other code needs to even be aware of the change.

Reference queues

Once a WeakReference starts returning null, the object it pointed to has become garbage and the WeakReference object is pretty much useless. This generally means that some sort of cleanup is required; WeakHashMap, for example, has to remove such defunct entries to avoid holding onto an ever-increasing number of dead WeakReferences.

The ReferenceQueue class makes it easy to keep track of dead references. If you pass a ReferenceQueue into a weak reference's constructor, the reference object will be automatically inserted into the reference queue when the object to which it pointed becomes garbage. You can then, at some regular interval, process the ReferenceQueue and perform whatever cleanup is needed for dead references.

Different degrees of weakness

Up to this point I've just been referring to "weak references", but there are actually four different degrees of reference strength: strong, soft, weak, and phantom, in order from strongest to weakest. We've already discussed strong and weak references, so let's take a look at the other two.

Soft references

soft reference is exactly like a weak reference, except that it is less eager to throw away the object to which it refers. An object which is only weakly reachable (the strongest references to it are WeakReferences) will be discarded at the next garbage collection cycle, but an object which is softly reachable will generally stick around for a while.

SoftReferences aren't required to behave any differently thanWeakReferences, but in practice softly reachable objects are generally retained as long as memory is in plentiful supply. This makes them an excellent foundation for a cache, such as the image cache described above, since you can let the garbage collector worry about both how reachable the objects are (a strongly reachable object will never be removed from the cache) and how badly it needs the memory they are consuming.

Phantom references

phantom reference is quite different than either SoftReference orWeakReference. Its grip on its object is so tenuous that you can't even retrieve the object -- its get() method always returns null. The only use for such a reference is keeping track of when it gets enqueued into a ReferenceQueue, as at that point you know the object to which it pointed is dead. How is that different from WeakReference, though?

The difference is in exactly when the enqueuing happens. WeakReferences are enqueued as soon as the object to which they point becomes weakly reachable. This is before finalization or garbage collection has actually happened; in theory the object could even be "resurrected" by an unorthodox finalize()method, but the WeakReference would remain dead. PhantomReferences are enqueued only when the object is physically removed from memory, and theget() method always returns null specifically to prevent you from being able to "resurrect" an almost-dead object.

What good are PhantomReferences? I'm only aware of two serious cases for them: first, they allow you to determine exactly when an object was removed from memory. They are in fact the only way to determine that. This isn't generally that useful, but might come in handy in certain very specific circumstances like manipulating large images: if you know for sure that an image should be garbage collected, you can wait until it actually is before attempting to load the next image, and therefore make the dreadedOutOfMemoryError less likely.

Second, PhantomReferences avoid a fundamental problem with finalization:finalize() methods can "resurrect" objects by creating new strong references to them. So what, you say? Well, the problem is that an object which overridesfinalize() must now be determined to be garbage in at least two separate garbage collection cycles in order to be collected. When the first cycle determines that it is garbage, it becomes eligible for finalization. Because of the (slim, but unfortunately real) possibility that the object was "resurrected" during finalization, the garbage collector has to run again before the object can actually be removed. And because finalization might not have happened in a timely fashion, an arbitrary number of garbage collection cycles might have happened while the object was waiting for finalization. This can mean serious delays in actually cleaning up garbage objects, and is why you can getOutOfMemoryErrors even when most of the heap is garbage.

With PhantomReference, this situation is impossible -- when aPhantomReference is enqueued, there is absolutely no way to get a pointer to the now-dead object (which is good, because it isn't in memory any longer). Because PhantomReference cannot be used to resurrect an object, the object can be instantly cleaned up during the first garbage collection cycle in which it is found to be phantomly reachable. You can then dispose whatever resources you need to at your convenience.

Arguably, the finalize() method should never have been provided in the first place. PhantomReferences are definitely safer and more efficient to use, and eliminating finalize() would have made parts of the VM considerably simpler. But, they're also more work to implement, so I confess to still using finalize()most of the time. The good news is that at least you have a choice.

Conclusion

I'm sure some of you are grumbling by now, as I'm talking about an API which is nearly a decade old and haven't said anything which hasn't been said before. While that's certainly true, in my experience many Java programmers really don't know very much (if anything) about weak references, and I felt that a refresher course was needed. Hopefully you at least learned a little something from this review.

译文来自:http://blog.csdn.net/xtyyumi301/article/details/3015493

以前我招聘过高级java工程师,其中一个面试题目是“你对weak reference了解多少?”。这个话题比较偏,不指望每个人都能清楚它的细节。如果面试的人说“Umm...好像和gc(垃圾回收)有点关系?”,那我就相当满意了。实际情况却是20多个5java开发经验的工程师只有2个知道有weak reference这么回事,其中1个是真正清楚的。我试图给他们一些提示,期望有人会恍然大悟,可惜没有。不知道为什么这个特性uncommon,确切地说,是相当uncommon,要知道这是在java1.2中推出的,那是7年前的事了。

没必要成为weak reference专家,装成资深java工程师(就像茴香豆的茴字有四种写法)。但是至少要了解一点点,知道是怎么回事。下面告诉你什么是weak references,怎么用及何时用它们。

l         Strong references
       从强引用(Strong references)开始。你每天用的就是strong reference,比如下面的代码:StringBuffer buffer = new StringBuffer()创建了一个StringBuffer对象,变量buffer保存对它的引用。这太小儿科了!是的,请保持点耐心。Strong reference,是什么使它们‘strong’?——是gc处理它们的方式:如果一个对象通过一串强引用链可达,那么它们不会被垃圾回收。你总不会喜欢gc把你正在用的对象回收掉吧。

l         When strong references are too strong
       我们有时候用到一些不能修改也不能扩展的类,比如final class,再比如,通过Factory创建的对象,只有接口,连是什么实现都不知道。想象一下,你正在用widget类,需要知道每个实例的扩展信息,比如它是第几个被创建的widget实例(即序列号),假设条件不允许在类中添加方法,widget类自己也没有这样的序列号,你准备怎么办?用HashMapserialNumberMap.put(widget, widgetSerialNumber),用变量记录新实例的序列号,创建实例时把实例和它的序列号放到HashMap中。很显然,这个Map会不断变大,从而造成内存泄漏。你要说,不要紧,在不用某个实例时就从map中删除它。是的,这可行,但是“put——remove”,你不觉得你在做与内存管理“new——delete”类似的事吗?像所有自己管理内存的语言一样,你不能有遗漏。这不是java风格。

       
另一个很普遍的问题是缓存,特别是很耗内存的那种,比如图片缓存。想象一下,有个项目要管理用户自己提供的图片,比如像我正在做的网站编辑器。自然地你会把这些图片缓存起来,因为每次从磁盘读取会很耗时,而且可以避免在内存中一张图片出现多份。你应该能够很快地意识到这有内存危机:由于图片占用的内存没法被回收,内存迟早要用完。把一部分图片从缓存中删除放到磁盘上去!——这涉及到什么时候删除、哪些图片要删除的问题。和widget类一样,不是吗,你在做内存管理的工作。

l         Weak reference
    
Weak reference,简单地说就是这个引用不会强到迫使对象必须保持在内存中。Gc不会碰Strong reference可达的对象,但可以碰weak reference可达的对象。下面创建一个weak referenceWeakReference weakWidget = new WeakReference(widget),使用weakWidget.get()来取到widget对象。注意,get()可能返回null。什么?null?什么时候变成null了?——当内存不足垃圾回收器把widget回收了时(如果是Strong reference,这是不可能发生的)。你会问,变成null之后要想再得到widget怎么办?答案是没有办法,你得重新创建widget对象,对cache系统这很容易做到,比如图片缓存,从磁盘载入图片即可(内存中的每份图片要在磁盘上保存一份)。

       
像上面的“widget序列号”问题,最简单的是用jdk内含的WeakHashMap类。WeakHashMapHashMap的工作方式类似,不过它的keys(注意不是values)都是weak reference。如果WeakHashMap中的一个key被垃圾回收了,那么这个entry会被自动删除。如果使用的是Map接口,那么实例化时只需把HashMap改成WeakHashMap,其它代码都不用变,就这么简单。

l         Reference queque
    
一旦WeakReference.get()返回null,它指向的对象被垃圾回收,WeakReference对象就一点用都没有了,如果要对这些没有的WeakReference做些清理工作怎么办?比如在WeakHashMap中要把回收过的keyMap中删除掉。jdk中的ReferenceQueue类使你可以很容易地跟踪dead referencesWeakReference类的构造函数有一个ReferenceQueue参数,当指向的对象被垃圾回收时,会把WeakReference对象放到ReferenceQueue中。这样,遍历ReferenceQueue可以得到所有回收过的WeakReferenceWeakHashMap的做法是在每次调用size()get()等操作时都先遍历ReferenceQueue,处理那些回收过的key,见jdk的源码WeakHashMap# expungeStaleEntries()

l         Different degrees of weakness
    
上面我们仅仅提到“weak reference”,实际上根据弱的层度不同有四种引用:强(strong)、软(soft)、弱(weak)、虚(phantom)。我们已经讨论过strongweak,下面看下softphantom

n         Soft reference
      
Soft referenceweak reference的区别是:一旦gc发现对象是weak reference可达就会把它放到ReferenceQueue中,然后等下次gc时回收它;当对象是Soft reference可达时,gc可能会向操作系统申请更多内存,而不是直接回收它,当实在没辙了才回收它。像cache系统,最适合用Soft reference

n         Phantom reference
      
虚引用Phantom referenceSoft referenceWeakReference的使用有很大的不同:它的get()方法总是返回null(不信可以看jdkPhantomReference源码)。这意味着你只能用PhantomReference本身,而得不到它指向的对象。它的唯一用处是你能够在ReferenceQueue中知道它被回收了。为何要有这种“不同”?

       
何时进入ReferenceQueue产生了这种“不同”。WeakReference是在它指向的对象变得弱可达(weakly reachable)时立即被放到ReferenceQueue中,这在finalizationgarbage collection之前发生。理论上,你可以在finalize()方法中使对象“复活”(使一个强引用指向它就行了,gc不会回收它),但WeakReference已经死了(死了?不太明白作者的确切意思。在finalize中复活对象不太能够说明问题。理论上你可以复活ReferenceQueue中的WeakReference指向的对象,但没法复活PhantomReference指向的对象,我想这才是它们的“不同”)。而PhantomReference不同,它是在garbage collection之后被放到ReferenceQueue中的,没法复活。
       PhantomReferences
的价值在哪里?我只说两点:1、你能知道一个对象已经从内存中删除掉了,事实上,这是唯一的途径。这可能不是很有用,只能用在某些特别的场景中,比如维护巨大的图片:只有图片对象被回收之后才有必要再载入,这在很大程度上可以避免OutOfMemoryError2、可以避免finalize()方法的缺点。在finalize方法中可以通过新建强引用来使对象复活。你可能要说,那又怎么样?——finalize的问题是对那些重载了finalize方法的对象垃圾回收器必须判断两遍才能决定回收它。第一遍,判断对象是否可达,如果不可达,看是否有finalization,如果有则调用,否则回收;第二遍判断对象是否可达,如果不可达,则回收。由于finalize是在内存回收之前调用的,那么在finalize中可能出现OutOfMemoryError,即使很多对象可以被回收。用PhantomReference就不会出现这种情况,当PhantomReference进入ReferenceQueue之后就没法再获得所指向的对象(它已经从内存中删除了)。由于PhantomReference不能使对象复活,所以它指向的对象可以在第一遍时回收,有finalize方法的对象就不行。可以证明,finalize方法不是首选。PhantomReference更安全更有效,可以简化VM的工作。虽然好处多,但要写的代码也多。所以我坦白承认,大部分情况我还是用finalize。不管怎么样,你多了个选择,不用在finalize这棵树上吊死。

l         总结
    
我打赌有人在嘟囔,说我在讲老黄历,没什么鲜货。你说得没错,不过,以我的经验仍有很多java工程师对weak reference没甚了解,这样一堂入门课对他们很有必要。真心希望你能从这篇文章中得到一点收获。