HashSet类注释翻译、fast-fail、源码分析

来源：互联网发布：淘宝返现卡片编辑：程序博客网时间：2024/06/13 12:12

没看过HashMap源码的，先看HashMap:http://blog.csdn.net/disiwei1012/article/details/73530598

一、类注释翻译

This class implements the <tt> Set</tt> interface, backed by a hash table(actually a <tt>HashMap </tt> instance).  It makes no guarantees as to theiteration order of the set; in particular, it does not guarantee that theorder will remain constant over time.  This class permits the <tt> null</tt>element.

HashSet类实现了Set接口，底层由hash表支持（基于HashMap实现）。不能保证集合的迭代顺序；特别是它不能保证元素的顺序不随时间而改变。
HashSet允许Null类型的元素。

This class offers constant time performance for the basic operations( <tt>add </tt> , <tt> remove </tt>, <tt> contains</tt> and <tt> size</tt> ),assuming the hash function disperses the elements properly among thebuckets.  Iterating over this set requires time proportional to the sum ofthe <tt>HashSet </tt> instance's size (the number of elements) plus the"capacity" of the backing <tt> HashMap</tt> instance (the number ofbuckets).  Thus, it's very important not to set the initial capacity toohigh (or the load factor too low) if iteration performance is important.

如果hash函数能够在桶中合理的分散元素，HashSet能够为该类基本的操作（add、remove、contains、size）提供效率的保证。
迭代HashSet集合需要的时间是和集合元素的数量以及桶的大小成比例的。由此，如果想提高效率，就不要将集合的初始容量设置太大（或者加载因子设置太小）

Note that this implementation is not synchronized.</strong>If multiple threads access a hash set concurrently, and at least one ofthe threads modifies the set, it <i> must</i> be synchronized externally.This is typically accomplished by synchronizing on some object thatnaturally encapsulates the set.If no such object exists, the set should be "wrapped" using the{@link Collections#synchronizedSet Collections.synchronizedSet}method.  This is best done at creation time, to prevent accidentalunsynchronized access to the set: <pre>     Set s = Collections.synchronizedSet(new HashSet(...));</pre>

HashSet类不是同步的，如果多个线程同时访问这个集合，并且大于等于一个线程对集合进行修改，那么必须要保证同步。
典型的实现方式是：通过同步一些对象（该集合中的元素都报错在该对象中，例如同步HashSet集合中的map对象）。
如果这种对象不存在，又想同步集合，可以这样写：

Collections.synchronizedSet(new HashSet(...))

The iterators returned by this class's <tt> iterator</tt> method arefail - fast</i> : if the set is modified at any time after the iterator iscreated, in any way except through the iterator's own <tt> remove</tt>method, the Iterator throws a {@link ConcurrentModificationException}.Thus, in the face of concurrent modification, the iterator fails quicklyand cleanly, rather than risking arbitrary, non- deterministic behavior atan undetermined time in the future.Note that the fail - fast behavior of an iterator cannot be guaranteedas it is, generally speaking, impossible to make any hard guarantees in thepresence of unsynchronized concurrent modification.  Fail- fast iteratorsthrow <tt>ConcurrentModificationException </tt> on a best- effort basis.Therefore, it would be wrong to write a program that depended on thisexception for its correctness: <i> the fail- fast behavior of iteratorsshould be used only to detect bugs. </i>

通过集合的iterator方法可以返回迭代器。这个迭代器实现了快速报错。快速报错（fail-fast）：如果在生成迭代器后，集合被修改（除了迭代器remove方法），迭代器将抛出异常ConcurrentModificationException。
因此，在并发修改的情况下，迭代器会迅速失败，而不会去等待。
注意，也不能保证在非并发修改的情况下，快速报错不会被触发，迭代器只能尽力而为。
因此，不应该编写一段依赖ConcurrentModificationException异常的程序。迭代器的快速报错应该只用于检测Bug.

二、快速报错fail - fast的小例子

快速报错，是指当有其他线程对一个容器（如ArrayList，HashMap）进行了结构性修改，另外一个线程在使用iterator进行迭代，那么这个迭代线程会抛出并发修改的异常ConcurrentModificationException。
所谓结构性修改，是对原有容器的size造成影响的操作，如remove、add、clear操作等。

public static void main(String[] args) {        List<String> stringList = new ArrayList<String>();        stringList .add("a" );        stringList .add("b" );        stringList .add("c" );        Iterator<String> iterator = stringList .iterator();        while (iterator .hasNext()) {            if (iterator .next().equals( "a")) {                stringList .remove("a" );            }        }    }

Exception in thread "main" java.util.ConcurrentModificationException     at java.util.ArrayList$Itr.checkForComodification( ArrayList.java:819)     at java.util.ArrayList$Itr.next( ArrayList.java:791 )     at com.demo3.Student.main( Student.java:23 )

上面这个例子没有使用多线程，其实这个原理很简单：ArrayList有个变量记录集合被修改的次数，当生成迭代器对象时，迭代器也会有个对象记录此时集合被修改的此时。
在迭代器的next、remove方法前，都会判断生成迭代器时的集合被修改次数是否等于目前集合被修改的次数，不一致时抛出ConcurrentModificationException异常。

不能通过快速失败去判断是否发生了某些期望的结果，因为是否发生快速失败是不确定的。
为什么说快速失败是不确定的，其中一种可能或许是由于线程执行的前后顺序不确定吧。

三、源码

public class HashSet<E>    extends AbstractSet<E>    implements Set<E>, Cloneable, java.io.Serializable{    static final long serialVersionUID = -5024744406713321676L;    // 底层使用HashMap来保存HashSet中所有元素。    private transient HashMap<E,Object> map ;    // 定义一个虚拟的Object对象作为HashMap的value，将此对象定义为static final。    private static final Object PRESENT = new Object();    public HashSet() {        map = new HashMap<>();    }    public HashSet(Collection<? extends E> c ) {        map = new HashMap<>(Math.max(( int ) (c .size()/.75f) + 1, 16));        addAll( c);    }    public HashSet( int initialCapacity , float loadFactor) {        map = new HashMap<>(initialCapacity , loadFactor );    }    public HashSet( int initialCapacity ) {        map = new HashMap<>(initialCapacity );    }    HashSet( int initialCapacity, float loadFactor , boolean dummy ) {        map = new LinkedHashMap<>(initialCapacity , loadFactor );    }    public Iterator<E> iterator() {        return map .keySet().iterator();    }    public int size() {        return map .size();    }    public boolean isEmpty() {        return map .isEmpty();    }    public boolean contains(Object o ) {        return map .containsKey( o);    }    public boolean add(E e ) {        return map.put(e , PRESENT)== null;    }    public boolean remove(Object o ) {        return map.remove(o )== PRESENT;    }    public void clear() {        map.clear();    }    public Object clone() {        try {            HashSet<E> newSet = (HashSet<E>) super .clone();            newSet. map = (HashMap<E, Object>) map .clone();            return newSet ;        } catch (CloneNotSupportedException e ) {            throw new InternalError();        }    }}

阅读全文

0 0