转载请注明出处:http://blog.csdn.net/jevonsCSDN/article/details/54619114

以前只知道HashMap是线程不安全的,拿来就用,也不会考虑会出现什么后果,直到最近在学习中终于暴露出了HashMap的短板出来,可又百思不得其解,于是在网上拜读了若干大牛有关HashMap的分析文章,发现他们其实写于很早之前,而HashMap的源码都已作更新,所以干脆抽空对HashMap的新版源码从头到尾地梳理了一遍,并写一篇分析博文帮助学习。 HashMap可以说是Java中最常用的集合类框架之一,是Java语言中非常典型的数据结构,我们总会在不经意间用到它,很大程度上方便了我们日常开发,因此我们更需要去把控好它的脉络。本文基于Java7的源码做剖析,内容有点长,在浏览的时候建议通过目录定位,文章若有不正之处欢迎指出。



  /**     * The default initial capacity - MUST be a power of two.     */    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16,默认初始容量为16,必须为2的幂;    /**     * The maximum capacity, used if a higher value is implicitly specified     * by either of the constructors with arguments.     * MUST be a power of two <= 1<<30.     */    static final int MAXIMUM_CAPACITY = 1 << 30;//最大容量值,容量值必须为2的幂且小于该值;    /**     * The load factor used when none specified in constructor.     */    static final float DEFAULT_LOAD_FACTOR = 0.75f;//默认加载因子    /**     * An empty table instance to share when the table is not inflated.     */    static final Entry<?,?>[] EMPTY_TABLE = {};//空的Entry数组,未调整表容量前共享。    /**     * The table, resized as necessary. Length MUST Always be a power of two.     */    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;//必须重设容量的Entry数组表,长度必须为2的幂;    /**     * The number of key-value mappings contained in this map.     */    transient int size;//HashMap的大小,即Entry元素总量;    /**     * The next size value at which to resize (capacity * load factor).     * @serial     */    // If table == EMPTY_TABLE then this is the initial capacity at which the    // table will be created when inflated.    int threshold;//临界值,如果表是空的,则该值作为空表膨胀的初始容量;    /**     * The load factor for the hash table.     *     * @serial     */    final float loadFactor;//哈希表的加载因子    /**     * The number of times this HashMap has been structurally modified     * Structural modifications are those that change the number of mappings in     * the HashMap or otherwise modify its internal structure (e.g.,     * rehash).  This field is used to make iterators on Collection-views of     * the HashMap fail-fast.  (See ConcurrentModificationException).     */    transient int modCount;//hashMap结构修改次数统计    /**     * The default threshold of map capacity above which alternative hashing is     * used for String keys. Alternative hashing reduces the incidence of     * collisions due to weak hash code calculation for String keys.     * <p/>     * This value may be overridden by defining the system property     * {@code jdk.map.althashing.threshold}. A property value of {@code 1}     * forces alternative hashing to be used at all times whereas     * {@code -1} value ensures that alternative hashing is never used.     */     // 默认备用哈希算法启用阈值,默认大小为Integer.MAX_VALUE,该变量被静态内部类Holder引用。    static final int ALTERNATIVE_HASHING_THRESHOLD_DEFAULT = Integer.MAX_VALUE;        /**     * A randomizing value associated with this instance that is applied to     * hash code of keys to make hash collisions harder to find. If 0 then     * alternative hashing is disabled.     */     //哈希种子,用于降低key的hash碰撞概率,如果为0则禁用备用哈希算法;    transient int hashSeed = 0;


/**     * holds values which can't be initialized until after VM is booted.     * 控制一些数据在VM启动之前不能初始化     */    private static class Holder {        /**         * Table capacity above which to switch to use alternative hashing.当表容量溢出时使用备用哈希算法。         */        static final int ALTERNATIVE_HASHING_THRESHOLD;        static {        //获取系统变量jdk.map.althashing.threshold,获取备用哈希算法阈值,默认为-1            String altThreshold = java.security.AccessController.doPrivileged(                new sun.security.action.GetPropertyAction(                    "jdk.map.althashing.threshold"));            int threshold;            try {            //初始化阈值                threshold = (null != altThreshold)                        ? Integer.parseInt(altThreshold)                        : ALTERNATIVE_HASHING_THRESHOLD_DEFAULT;                // disable alternative hashing if -1                //如果阈值为-1,则禁用备用哈希算法                if (threshold == -1) {                    threshold = Integer.MAX_VALUE;                }                if (threshold < 0) {                    throw new IllegalArgumentException("value must be positive integer.");                }            } catch(IllegalArgumentException failed) {                throw new Error("Illegal value for 'jdk.map.althashing.threshold'", failed);            }            //初始化备用哈希算法阈值            ALTERNATIVE_HASHING_THRESHOLD = threshold;        }    }

为了理解Holder这个静态内部类,可真是在翻了N久的资料,很多文章讲到这里都是直接跳过,本人也是看得云里雾里,怎么莫名其妙的蹦出这么个东西,好像在源码中也没多大用处,没错,它是没多大用,至少对于目前的我们这种菜鸡来说,因为它涉及到了一种JDK1.7新加入的哈希算法:sun.misc.Hashing.stringHash32((String) k),针对String类型的key,提供一个新的hash算法处理hashcode分布以减少冲突,这个算法是不稳定的,还在实验阶段,默认情况下是关闭的,要想启用这个新特性,需要手动设置jdk.map.althashing.threshold为非负数(默认为-1),这一点可以从Holder源码中看出。

下面引用Mikhail Vorontsov在关于Changes to String internal representation made in Java 1.7.0_06一文中的几段话作解释:

There is another change introduced to String class in the same update: a new hashing algorithm. Oracle suggests that a new algorithm gives a better distribution of hash codes, which should improve performance of several hash-based collections: HashMap, Hashtable, HashSet, LinkedHashMap, LinkedHashSet,WeakHashMap and ConcurrentHashMap. Unlike changes from the first part of this article, these changes are experimental and turned off by default.


As you may guess, these changes are only for String keys. If you want to turn them on, you’ll have to set a jdk.map.althashing.threshold system property to a non-negative value (it is equal to -1 by default). This value will be a collection size threshold, after which a new hashing method will be used. A small remark here: hashing method will be changed on rehashing only (when there is no more free space). So, if a collection was rehashed last time at size = 160 and jdk.map.althashing.threshold = 200, then a method will only be changed when your collection will grow to size of 320 (approximately).

正如你所想那样,这些新特性只用于String类型的Key。如果你想启用这个特性,你可以将系统参数 jdk.map.althashing.threshold设置为非负数(默认为-1),这个值将会成为集合大小的阈值,新的哈希算法将会在超越阈值时使用。提醒一下:哈希算法的只会在重算hash时改变(当没有多余空间的时候)。所以,如果一个集合上一次rehash时的大小为160,而 jdk.map.althashing.threshold = 200,则新的哈希算法将会在集合大小到达320(大概)时启用。

是不是已经有点感觉了?新的hash算法的使用只有在rehash中才会用到,而这个Holder静态内部类,只是加载并初始化ALTERNATIVE_HASHING_THRESHOLD参数而已。有兴趣的话可以仔细看一看这篇文章,另外在Stark Overflow里面也有相关问答。如果还搞不懂,可以先放下以后再看,你只需知道一般情况下,我们不会用到它就是了,要是非要弄个一清二白,非常建议你重复一下我的求索过程,茫茫net中求知去吧~


Constructor and Description HashMap()
Constructs an empty HashMap with the default initial capacity (16) and the default load factor (0.75). 构造一个空的HashMap,默认初始容量为16,默认加载因子为0.75。 HashMap(int initialCapacity)
Constructs an empty HashMap with the specified initial capacity and the default load factor (0.75).构造一个空的HashMap,指定初始容量,默认加载因子为0.75。 HashMap(int initialCapacity, float loadFactor)
Constructs an empty HashMap with the specified initial capacity and load factor.构造一个空的HashMap,指定初始容量和加载因子。 HashMap(Map<? extends K,? extends V> m)
Constructs a new HashMap with the same mappings as the specified Map.构造一个映射关系与指定 Map 相同的 HashMap。


//其他三种构造方法最后都指向了该构造方法    public HashMap(int initialCapacity, float loadFactor) {        //检查初始容量是否小于0,是则抛出异常        if (initialCapacity < 0)            throw new IllegalArgumentException("Illegal initial capacity: " +                                               initialCapacity);        //检查初始容量是否大于默认最大容量值,是则重置为MAXIMUM_CAPACITY        if (initialCapacity > MAXIMUM_CAPACITY)            initialCapacity = MAXIMUM_CAPACITY;        //检查加载因子是否合法        if (loadFactor <= 0 || Float.isNaN(loadFactor))            throw new IllegalArgumentException("Illegal load factor: " +                                               loadFactor);        //指定加载因子        this.loadFactor = loadFactor;        //初始化阈值        threshold = initialCapacity;        //初始化函数,里面是空的,供子类调用        init();    }    public HashMap(int initialCapacity) {        this(initialCapacity, DEFAULT_LOAD_FACTOR);    }    public HashMap() {        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);    }    public HashMap(Map<? extends K, ? extends V> m) {        this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,                      DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);        inflateTable(threshold);        putAllForCreate(m);    }



public V put(K key, V value) {    //检查是否为空表,是则膨胀容量        if (table == EMPTY_TABLE) {            inflateTable(threshold);        }        //检查key是否为null,这个很熟悉吧        if (key == null)            return putForNullKey(value);        //计算key的hash值        int hash = hash(key);        //获取bucketIndex,即在table中存放的位置        int i = indexFor(hash, table.length);        //取出该索引下的Entry,遍历单链        for (Entry<K,V> e = table[i]; e != null; e = e.next) {            Object k;            //检查hash码是否相同,key是否相等            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {                //该key已存在,取出对应的value并转移                V oldValue = e.value;                //存入新的value                e.value = value;                //该方法内容为空,供子类重写所用                e.recordAccess(this);                //返回对应的旧value                return oldValue;            }        }        //记录表结构修改次数;到了这里证明,该table中并不存在该key,向表中增加Entry        modCount++;        //增加Entry        addEntry(hash, key, value, i);        //返回空值        return null;    }

1. HashMap是在put操作的时候才开始膨胀的;
2. 然后判断输入的key是否为空值,如果为空则调用putForNullKey(V)设入空key(原理差不多,但需要注意,空Key都是放在table[0]里面的);
3. hash(key)获取哈希码;
4. indexFor(hash, table.length)获取存放位置的索引;
5. 遍历table[i],检查是否存在,存在则覆盖并返回旧值;
6. 不存在,准备修改表结构,先记录次数;
7. 调用addEntry(hash, key, value, i)增加元素。


inflateTable :

    /**     * Inflates the table.     * 膨胀表容量     */    private void inflateTable(int toSize) {        // Find a power of 2 >= toSize        //将指定的表容量toSize传入,获取大于或等于toSize的2的幂值        int capacity = roundUpToPowerOf2(toSize);        //获取下一次膨胀的阈值;        threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);        //创建指定容量的新表        table = new Entry[capacity];        //初始化哈希种子作为备用        initHashSeedAsNeeded(capacity);    }


roundUpToPowerOf2 :

    private static int roundUpToPowerOf2(int number) {        // assert number >= 0 : "number must be non-negative";        int rounded = number >= MAXIMUM_CAPACITY                ? MAXIMUM_CAPACITY                : (rounded = Integer.highestOneBit(number)) != 0                    ? (Integer.bitCount(number) > 1) ? rounded << 1 : rounded                    : 1;        return rounded;    }


  1. 判断number是否大于MAXIMUM_CAPACITY,是则返回MAXIMUM_CAPACITY,否则进入第二步;
  2. 获取nubmer中的1出现的最高位(待会细讲)赋给rounded,若rounded等于零,返回1,否则进入第三步;
  3. 获取number的1位出现的次数,若大于1,则rounded左移一位 (保证为2的幂),否则rounded为1,返回rounded;




highestOneBit (int)

//该函数实现获取指定int数的二进制数中1出现的最高位public static int highestOneBit(int i) {        // HD, Figure 3-1        i |= (i >>  1);        i |= (i >>  2);        i |= (i >>  4);        i |= (i >>  8);        i |= (i >> 16);        return i - (i >>> 1);    }

WTF?!又见位运算,高大上啊有没有!但是有没有一脸懵逼的感觉?好吧,快告诉我不是只有我才这么无聊去研究这个是怎么实现的。先来个简单的4bit运算,假设有个数 i=0110,我们来最笨的方法一位一位的移动:



bitCount (int)

//该函数实现统计指定int数的二进制数中1出现的的次数。    public static int bitCount(int i) {        // HD, Figure 5-2        i = i - ((i >>> 1) & 0x55555555);        i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);        i = (i + (i >>> 4)) & 0x0f0f0f0f;        i = i + (i >>> 8);        i = i + (i >>> 16);        return i & 0x3f;    }


先来分析一下第一行代码:i = i - ((i >>> 1) & 0x55555555);
假设有个数 0xBC637EFF:1011 1100 0110 0011 0111 1110 1111 1111



  1. (i >>> 1)先将i无符号右移,则每2位中的高位移向低位,我们的目的是在这基础上再将每2位中的高位置0(此时的高位为原每2位中的低位);
  2. (i >>> 1)& 0x55555555:将每2位中的高位置0;
  3. 此时将出现以下结果:

     1011 1100 0110 0011 0111 1110 1111 1111 [i]
     0101 0100 0001 0001 0001 0101 0101 0101 [(i>>1)&0x55555555]

01 :此数的高位永远为0,而低位则是上一行的高位,上下两数之差必等于上一行中1出现的次数。

这其实等价于i = (i& 0x55555555) + ((i >>> 1) & 0x55555555),这样更好理解,把原i和0x55555555相与过滤掉每2位中的高位,这样就只剩下低位了,而(i >>> 1) & 0x55555555又把高位移到了低位,两个数相加同样等于1出现的次数。理解了这个,后面就不难理解了吧,原理都是一样的。



    /**     * Initialize the hashing mask value. We defer initialization until we     * really need it.     * 初始化哈希掩码值。我们延迟初始化它直到我们需要它的时候。     */    final boolean initHashSeedAsNeeded(int capacity) {        //检查当前备用哈希算法状态,hashSeed初始值为0        boolean currentAltHashing = hashSeed != 0;        //检查是否需要启用备用哈希算法        //一般情况下,capacity小于Holder.ALTERNATIVE_HASHING_THRESHOLD,因此该值为false        boolean useAltHashing = sun.misc.VM.isBooted() &&                (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);        //进行异或判断,一般情况下为switching为false        boolean switching = currentAltHashing ^ useAltHashing;         //若switching=true,则进行以下操作        if (switching) {           //若useAltHashing=true,返回随机hashSeed,否则返回0;            hashSeed = useAltHashing                ? sun.misc.Hashing.randomHashSeed(this)                : 0;        }        return switching;    }


  • inflateTable(int toSize)
  • resize(int newCapacity)


    final int hash(Object k) {        int h = hashSeed;        //检测hash种子的状态,决定是否启用新的hash算法。        if (0 != h && k instanceof String) {            return sun.misc.Hashing.stringHash32((String) k);        }        //使用旧的哈希算法        h ^= k.hashCode();        // This function ensures that hashCodes that differ only by        // constant multiples at each bit position have a bounded        // number of collisions (approximately 8 at default load factor).        //保证hashCode 不同的算法,看不懂就随缘啦,太凶残了        h ^= (h >>> 20) ^ (h >>> 12);        return h ^ (h >>> 7) ^ (h >>> 4);    }


    /**     * Returns index for hash code h.     * 返回该hashcode在table中对应的索引     */    static int indexFor(int h, int length) {        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";保证表容量必须为2的幂。        //hashcode在table中对应的索引        return h & (length-1);    }


则length=16, 二进制为0000 0000 0000 0000 0000 0000 0001 0000
lenght-1 =15,二进制为0000 0000 0000 0000 0000 0000 0000 1111

那么通过h & (length-1)得到的就是key在表中的索引位置。h & (length-1)h%length等价不等效,位运算的速度和效率是非常高的,这就是容量必须为2的幂的原因。



 void addEntry(int hash, K key, V value, int bucketIndex) { //检查存放元素的数量是否大于或等于阈值,该bucketIndex下的表位置是否不为空        if ((size >= threshold) && (null != table[bucketIndex])) {            //扩容至原来2倍            resize(2 * table.length);            hash = (null != key) ? hash(key) : 0;            //重新计算索引            bucketIndex = indexFor(hash, table.length);        }        //容量充足,进入创建Entry操作        createEntry(hash, key, value, bucketIndex);    }



//重新调整表容量  void resize(int newCapacity) {      //备份表数据        Entry[] oldTable = table;        int oldCapacity = oldTable.length;        //检查旧表的容量是否已是最大值,是则终止扩容直接返回        if (oldCapacity == MAXIMUM_CAPACITY) {            threshold = Integer.MAX_VALUE;            return;        }        //创建空的新表        Entry[] newTable = new Entry[newCapacity];        //转移表数据,第二个参数决定是否重算hash码        transfer(newTable, initHashSeedAsNeeded(newCapacity));        //新表覆盖旧表        table = newTable;        //计算下一次调整的阈值        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);    }


    /**     * Transfers all entries from current table to newTable.     */    void transfer(Entry[] newTable, boolean rehash) {        int newCapacity = newTable.length;        //遍历table中的Entry        for (Entry<K,V> e : table) {            //遍历Entry单链            while(null != e) {                Entry<K,V> next = e.next;                if (rehash) {                    e.hash = null == e.key ? 0 : hash(e.key);                }                //重新计算索引                int i = indexFor(e.hash, newCapacity);                //置空e.next。将table[i]的空引用赋值给e.next,此时Entry链表中只有一个e。                //也就是这里,会触发多线程并发问题                e.next = newTable[i];                //将e放入新table[i]中;                newTable[i] = e;                //将next链表赋值给e,继续循环遍历。                e = next;            }        }    }




那么这就牵扯到了多线程并发问题了,我在源码注释中也提到, e.next =
= next



    void createEntry(int hash, K key, V value, int bucketIndex) {    //初始化索引为bucketIndex的表位置        Entry<K,V> e = table[bucketIndex];        //初始化Entry,可能会引发多线程并发问题        table[bucketIndex] = new Entry<>(hash, key, value, e);        //元素加1        size++;    }


Entry是一个链表结构,如果在new Entry<>(hash, key, value, e)操作中,有两个线程同时在此刻拿到相同的e,那么这两个线程就会竞争作为e的链头的所有权,势必会有一个会被覆盖掉,而在你进行get操作想取被覆盖掉的entry,那自然也是取不到的,返回空值。



static class Entry<K,V> implements Map.Entry<K,V> {        final K key;        V value;        //体现了entry的链表特性        Entry<K,V> next;        int hash;        /**         * Creates new entry.         * 将新new的entry插入到旧entry的链头         */        Entry(int h, K k, V v, Entry<K,V> n) {            value = v;            next = n;            key = k;            hash = h;        }    //省略展示部分方法    }



    public V get(Object key) {    //检测是否为空key        if (key == null)            return getForNullKey();        //获取相应的Entry        Entry<K,V> entry = getEntry(key);        //检查entry是否为空,是则返回null;否则返回对应的value        return null == entry ? null : entry.getValue();    }


final Entry<K,V> getEntry(Object key) {        //检查表中元素数量        if (size == 0) {            return null;        }        //检测key是否为空,是则返回0;否则返回key的hash码        int hash = (key == null) ? 0 : hash(key);        //根据hash码和表长度获取索引,从table中取出entry        for (Entry<K,V> e = table[indexFor(hash, table.length)];             e != null;             e = e.next) {            Object k;            //检测hash是否相同,key的内存地址是否相等,key是否为null,key的equals方法返回值是否为true(之所以要比较这个是因为可以通过重写equals实现两个不同内存地址的对象返回true值)。            if (e.hash == hash &&                ((k = e.key) == key || (key != null && key.equals(k))))                //返回entry                return e;        }        return null;    }



  • 在包含HashMap的方法中实现同步机制,效率太低
  • 外部包装:Map<K,V> map = Collections.synchronizedMap(new HashMap<K,V>());
  • HashTable,效率太低
  • 使用JDK1.5中引进的Concurrent包下的ConcurrentHashMap,相对安全高效,建议使用。我在另一篇文章中也有介绍。



