Java集合类HashMap实现原理

来源：互联网发布：template.js helper 编辑：程序博客网时间：2024/06/08 17:04

HashMap的一些特性：

1.能存储键值对；

2.接收null键或null值；

3.无需定义其长度，自动扩容，通过put方法存储键值对，通过get()获得key对应的值

4.存储是无需的，即打印出所有键值对时无法按照放入的顺序打印出来；

5.key不能重复，重复的key后入的替换前入的键值对。

这些特性是由其数据结构和内部处理逻辑决定的，通过研读其源码，我们可以深刻理解HashMap这些特性。推荐使用idea开发平台，其中一个特性就是读源码非常棒，还有其他很多的优点，这里不做介绍。

我们直接看看HashMap的put()函数是如何工作的：

/**     * Returns the value to which the specified key is mapped,     * or {@code null} if this map contains no mapping for the key.     *     * <p>More formally, if this map contains a mapping from a key     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :     * key.equals(k))}, then this method returns {@code v}; otherwise     * it returns {@code null}.  (There can be at most one such mapping.)     *     * <p>A return value of {@code null} does not <i>necessarily</i>     * indicate that the map contains no mapping for the key; it's also     * possible that the map explicitly maps the key to {@code null}.     * The {@link #containsKey containsKey} operation may be used to     * distinguish these two cases.     *     * @see #put(Object, Object)     */    public V get(Object key) {        if (key == null)            return getForNullKey();        Entry<K,V> entry = getEntry(key);        return null == entry ? null : entry.getValue();    }

首先有个判断if (table == EMPTY_TABLE)，table是什么呢，找到其定义的地方，我们发现这句源码：

/**     * The table, resized as necessary. Length MUST Always be a power of two.     */    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

他是一个Entry<K,V>类型的数组，初始值是一个空的数组，那么Entry<K,V>又是什么呢：

static class Entry<K,V> implements Map.Entry<K,V> {        final K key;        V value;        Entry<K,V> next;        int hash;        /**         * Creates new entry.         */        Entry(int h, K k, V v, Entry<K,V> n) {            value = v;            next = n;            key = k;            hash = h;        }}

我们目前还是只着重核心的部分，Entry 是一个 static class，其中包含了 key 和 value，也就是键值对，另外还包含了一个 next 的 Entry 指针。我们可以总结出：Entry 就是数组中的元素，每个 Entry 其实就是一个 key-value 对，它持有一个指向下一个元素的引用，这就构成了链表。

put()方法里面先判断table是否为空，空的话先定义好这个数组的长度，数组长度默认大小是2的4次方，这个长度我们我可以在new HashMap() 的时候传入一个我们需要的长度，但这个长度必须是2的次幂，这是因为这样的长度的数组能最大限度的被使用，减少碰撞概率（具体看附录1）。

put()接着第二个if(key==null){return putForNullKey(value);}//这里面就是处理null key，他是将null的key放在table[0]了。

接着两个变量int hash = hash(key);int i = indexFor(hash, table.length);i便是这个键值对要放在table数组的位置，他是通过key的hash值与table.length-1做了个与运算，这个运算保证i在数组的length范围内。接着是这样一段代码：

for (Entry<K,V> e = table[i]; e != null; e = e.next) {            Object k;            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {                V oldValue = e.value;                e.value = value;                e.recordAccess(this);                return oldValue;            }        }

这里是判断当前put的key与之前的key是否有相同的，有的话用put的value替换之前的value并返回oldvalue;

如果没有相同的可以变调用addEntry(hash, key, value, i); 插入新的entry;我们看看addEntry方法是怎么处理的：

/**     * Adds a new entry with the specified key, value and hash code to     * the specified bucket.  It is the responsibility of this     * method to resize the table if appropriate.     *     * Subclass overrides this to alter the behavior of put method.     */    void addEntry(int hash, K key, V value, int bucketIndex) {        if ((size >= threshold) && (null != table[bucketIndex])) {            resize(2 * table.length);            hash = (null != key) ? hash(key) : 0;            bucketIndex = indexFor(hash, table.length);        }        createEntry(hash, key, value, bucketIndex);    }

通过注释便可以知道，他是会新增一个存储键值对的entry对象到buket里面（即table数组），同时在这里可能会对table数组进行扩容；首先的判断就是做扩容操作的，size是当前table的entry的个数，threshold是table的阈值（capacity * load factor），数组初始长度（默认16）*一个阈值（默认0.75），这样做是为了减少链表长度，因为链表的操作涉及循环，较耗时，理想状态下一个table位置放一个entry，entry的next为空最好，但事实上不同的key通过int hash = hash(key);int i = indexFor(hash, table.length);这个计算出来的i会出现相同的情况即发生了碰撞，这时候就需要通过链表来存储同一位置的键值对，而entry结构中的Entry next变量就可以处理这个问题；

/**     * Like addEntry except that this version is used when creating entries     * as part of Map construction or "pseudo-construction" (cloning,     * deserialization).  This version needn't worry about resizing the table.     *     * Subclass overrides this to alter the behavior of HashMap(Map),     * clone, and readObject.     */    void createEntry(int hash, K key, V value, int bucketIndex) {        Entry<K,V> e = table[bucketIndex];        table[bucketIndex] = new Entry<>(hash, key, value, e);        size++;    }

从createEntry()方法的代码就可以看出，它把table的bucketIndex位置上的entry放到了新的entr的next中。

至此，我们对HashMap的存储有了清晰的了解。

我们再来看看get()方法是如何取数据的：

public V get(Object key) {        if (key == null)            return getForNullKey();        Entry<K,V> entry = getEntry(key);        return null == entry ? null : entry.getValue();    }    /**     * Offloaded version of get() to look up null keys.  Null keys map     * to index 0.  This null case is split out into separate methods     * for the sake of performance in the two most commonly used     * operations (get and put), but incorporated with conditionals in     * others.     */    private V getForNullKey() {        if (size == 0) {            return null;        }        for (Entry<K,V> e = table[0]; e != null; e = e.next) {            if (e.key == null)                return e.value;        }        return null;    }    /**     * Returns the entry associated with the specified key in the     * HashMap.  Returns null if the HashMap contains no mapping     * for the key.     */    final Entry<K,V> getEntry(Object key) {        if (size == 0) {            return null;        }        int hash = (key == null) ? 0 : hash(key);        for (Entry<K,V> e = table[indexFor(hash, table.length)];             e != null;             e = e.next) {            Object k;            if (e.hash == hash &&                ((k = e.key) == key || (key != null && key.equals(k))))                return e;        }        return null;    }

很简单，key为空就获取table[0]的entry的value，否则再通过存储时计算table下标一样的方法nt hash = hash(key);int i = indexFor(hash, table.length);来计算下标，获取下标的entry，并循环里面的链表，获取key相同的对应的value，由此便能通过key找出value。

读完源码，感觉HashMap的结构其实也很简单，就是通过一个Entry数组来存储，entry里面能存储键值对，而entry在数组中的位置是由key的hash值在对table.length-1的与运算获得的，因此存储完也还能快速的找到其位置，而这里面有个容量的设置，当table里面的entry个数达到一定的阈值后便会自动扩容，容量是原有容量的2倍，扩容会new一个table，同时将原来table的entry复制到新table，这是很消耗性能的操作，所以在可以预见键值对个数的情况下可以在初始化HashMap的时候给它指定table的容量，避免扩容操作。

附录1(摘抄自极客学院)

当 length 总是 2 的 n 次方时，h& (length-1)运算等价于对 length 取模，也就是 h%length，但是 & 比 % 具有更高的效率。这看上去很简单，其实比较有玄机的，我们举个例子来说明：

假设数组长度分别为 15 和 16，优化后的 hash 码分别为 8 和 9，那么 & 运算后的结果如下：

h & (table.length-1)hash table.length-1 8 & (15-1)：0100&1110= 01009 & (15-1)：0101&1110= 01008 & (16-1)：0100&1111= 01009 & (16-1)：0101&1111= 0101

从上面的例子中可以看出：当它们和 15-1（1110）“与”的时候，产生了相同的结果，也就是说它们会定位到数组中的同一个位置上去，这就产生了碰撞，8 和 9 会被放到数组中的同一个位置上形成链表，那么查询的时候就需要遍历这个链表，得到8或者9，这样就降低了查询的效率。同时，我们也可以发现，当数组长度为 15 的时候，hash 值会与 15-1（1110）进行“与”，那么最后一位永远是 0，而 0001，0011，0101，1001，1011，0111，1101 这几个位置永远都不能存放元素了，空间浪费相当大，更糟的是这种情况中，数组可以使用的位置比数组长度小了很多，这意味着进一步增加了碰撞的几率，减慢了查询的效率！而当数组长度为16时，即为2的n次方时，2n-1 得到的二进制数的每个位上的值都为 1，这使得在低位上&时，得到的和原 hash 的低位相同，加之 hash(int h)方法对 key 的 hashCode 的进一步优化，加入了高位计算，就使得只有相同的 hash 值的两个值才会被放到数组中的同一个位置上形成链表。

所以说，当数组长度为 2 的 n 次幂的时候，不同的 key 算得得 index 相同的几率较小，那么数据在数组上分布就比较均匀，也就是说碰撞的几率小，相对的，查询的时候就不用遍历某个位置上的链表，这样查询效率也就较高了。

附录2（摘抄自极客学院）

HashMap 的两种遍历方式

第一种

　　Map map = new HashMap();　　Iterator iter = map.entrySet().iterator();　　while (iter.hasNext()) {　　Map.Entry entry = (Map.Entry) iter.next();　　Object key = entry.getKey();　　Object val = entry.getValue();　　}

效率高,以后一定要使用此种方式！

第二种

　　Map map = new HashMap();　　Iterator iter = map.keySet().iterator();　　while (iter.hasNext()) {　　Object key = iter.next();　　Object val = map.get(key);　　}

效率低,以后尽量少使用！

阅读全文

1 0