HashMap实现原理

来源：互联网发布：每一个明天知乎编辑：程序博客网时间：2024/05/16 09:59

Map用于实现关联数组，也就是用于保存(key,value)这样的数据结构。
在java中的基本实现有如下几类

实现简介 HashMap 基于key的散列表实现，插入和查询键值对的速度为常数 LinkedHashMap 类似HashMap，但是通过额外的链表结构保持了插入的顺序 TreeMap 基于红黑树的排序树，节点为键值对，可返回子树 WeakHashMap 键值对为弱引用，垃圾回收行为有所不同 ConcurrentHashMap 线程安全的Map

下面重点介绍

HashMap

　　总体来说，HashMap主要采用了命名为bucket的数组作为主要的数据结构，通过key的散列值（也就是key.hashCode()）作为对bucket索引的依据，对于冲突的情况采用hash里面的拉链法来解决，也就是将hash值冲突的对象链接在一个list中。因此我们总结在HashMap中查询某个键的过程为：
　　根据key的hashCode()计算index，然后看bucket[index]是否为空，若为空，表示不存在。否则再通过key的equals()方法来和对应位置的list中的一个一个比较，看是否找到equals()为true的对象，否则不存在。
从上面的过程我们可以看出，主要有两点：一是通过key的hashCode来确定桶位，而是通过equals方法再次比较两个对象是否”相等”。

　　下面我们简单解读一下jdk中关于HashMap的源码：

　　键值对存储结构：

    /**     * Basic hash bin node, used for most entries.  (See below for     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)     */    static class Node<K,V> implements Map.Entry<K,V> {        final int hash;        final K key;        V value;        Node<K,V> next;        Node(int hash, K key, V value, Node<K,V> next) {            this.hash = hash;            this.key = key;            this.value = value;            this.next = next;        }        public final K getKey()        { return key; }        public final V getValue()      { return value; }        public final String toString() { return key + "=" + value; }        public final int hashCode() {            return Objects.hashCode(key) ^ Objects.hashCode(value);        }        public final V setValue(V newValue) {            V oldValue = value;            value = newValue;            return oldValue;        }        public final boolean equals(Object o) {            if (o == this)                return true;            if (o instanceof Map.Entry) {                Map.Entry<?,?> e = (Map.Entry<?,?>)o;                if (Objects.equals(key, e.getKey()) &&                    Objects.equals(value, e.getValue()))                    return true;            }            return false;        }    }

　　这个Node除了key、value域之外，还包括一个next指针，用于存储hash值冲突的节点。
注意到节点的hashCode()方法的实现，是通过调用Objects工具类的静态方法Objects.hashCode(Object)来实现的，实际上就是多态调用了key和value的hashCode方法。但是对两者的hash值采用按位异或方式有点不懂，为什么是这样的。

　　另外，在HashMap里面，生成key的hash值是这样的：

    static final int hash(Object key) {        int h;        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);    }

　　也就是用key的hash值与hash值无符号右移16位的异或。

　　考虑常用的操作，查找节点getNode：

    /**     * Implements Map.get and related methods     *     * @param hash hash for key     * @param key the key     * @return the node, or null if none     */    final Node<K,V> getNode(int hash, Object key) {        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;        if ((tab = table) != null && (n = tab.length) > 0 &&            (first = tab[(n - 1) & hash]) != null) {            if (first.hash == hash && // always check first node                ((k = first.key) == key || (key != null && key.equals(k))))                return first;            if ((e = first.next) != null) {                if (first instanceof TreeNode)                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);                do {                    if (e.hash == hash &&                        ((k = e.key) == key || (key != null && key.equals(k))))                        return e;                } while ((e = e.next) != null);            }        }        return null;    }

　　传入的参数hash是hash(key)得到的，桶位是通过(n-1)&(hash)得到的。比较分两步，e.hash == hash用于验证hash值是否一样，从前面过来到这一步确实是有可能不一样的？我们发现，e其实是一个键值对Node < K,V >对其.hash域访问，刚开始让我误以为就是调用hashCode方法返回值，而该方法的实现表明还与value的hash值有关，这样从逻辑上来说是不合理的，因为定位键值对我们只用key。后来发现Node对象的hash域是final类型，由初始化的时候指定，就是key的hash值。so.

　　然后就是putVal final方法：

    /**     * Implements Map.put and related methods     *     * @param hash hash for key     * @param key the key     * @param value the value to put     * @param onlyIfAbsent if true, don't change existing value     * @param evict if false, the table is in creation mode.     * @return previous value, or null if none     */    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,                   boolean evict) {        Node<K,V>[] tab; Node<K,V> p; int n, i;        if ((tab = table) == null || (n = tab.length) == 0)            n = (tab = resize()).length;        if ((p = tab[i = (n - 1) & hash]) == null)            tab[i] = newNode(hash, key, value, null);        else {            Node<K,V> e; K k;            if (p.hash == hash &&                ((k = p.key) == key || (key != null && key.equals(k))))                e = p;            else if (p instanceof TreeNode)                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);            else {                for (int binCount = 0; ; ++binCount) {                    if ((e = p.next) == null) {                        p.next = newNode(hash, key, value, null);                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st                            treeifyBin(tab, hash);                        break;                    }                    if (e.hash == hash &&                        ((k = e.key) == key || (key != null && key.equals(k))))                        break;                    p = e;                }            }            if (e != null) { // existing mapping for key                V oldValue = e.value;                if (!onlyIfAbsent || oldValue == null)                    e.value = value;                afterNodeAccess(e);                return oldValue;            }        }        ++modCount;        if (++size > threshold)            resize();        afterNodeInsertion(evict);        return null;    }

　　该过程和查找过程很相似，只不过对原先不存在的节点就新增，否则就更新里面的value。代码有点复杂，因为考虑了很多情况，比如是否要扩容，还有一些优化，比如如果是TreeNode排序结构，则可以快速插入,etc。

　　总之基本思想体现了最开始提到的bucket，还有两步验证的策略。hash和equals()。

　　还有一个就是entrySet()方法，该方法返回map中所有键值对集合一个视图，而不是一个副本，map中键值对的修改会反映到该set，反过来也是。在该视图set中迭代的过程中如果修改了底层的map，会导致不可预知的结果，比如concurrentModificationException。

    final class EntrySet extends AbstractSet<Map.Entry<K,V>> {        public final int size()                 { return size; }        public final void clear()               { HashMap.this.clear(); }        public final Iterator<Map.Entry<K,V>> iterator() {            return new EntryIterator();        }        public final boolean contains(Object o) {            if (!(o instanceof Map.Entry))                return false;            Map.Entry<?,?> e = (Map.Entry<?,?>) o;            Object key = e.getKey();            Node<K,V> candidate = getNode(hash(key), key);            return candidate != null && candidate.equals(e);        }        public final boolean remove(Object o) {            if (o instanceof Map.Entry) {                Map.Entry<?,?> e = (Map.Entry<?,?>) o;                Object key = e.getKey();                Object value = e.getValue();                return removeNode(hash(key), key, value, true, true) != null;            }            return false;        }        public final Spliterator<Map.Entry<K,V>> spliterator() {            return new EntrySpliterator<>(HashMap.this, 0, -1, 0, 0);        }        public final void forEach(Consumer<? super Map.Entry<K,V>> action) {            Node<K,V>[] tab;            if (action == null)                throw new NullPointerException();            if (size > 0 && (tab = table) != null) {                int mc = modCount;                for (int i = 0; i < tab.length; ++i) {                    for (Node<K,V> e = tab[i]; e != null; e = e.next)                        action.accept(e);                }                if (modCount != mc)                    throw new ConcurrentModificationException();            }        }    }

　　比如最后抛出的异常。
　　另外，怎么去理解EntrySet是作为HashMap上的视图？我们看到，实际上对EntrySet这个集合的操作，比如remove，是通过removeNode(hash(key), key, value, true, true)这行来实现的，也就是调用HashMap的方法，底层就是直接操纵的bucket数组。
另外，还有Entry的迭代器：EntryIterator()。

    final class EntryIterator extends HashIterator        implements Iterator<Map.Entry<K,V>> {        public final Map.Entry<K,V> next() { return nextNode(); }    }

继承了HashIterator，这个HashIterator是对hash表的一个迭代器：

    abstract class HashIterator {        Node<K,V> next;        // next entry to return        Node<K,V> current;     // current entry        int expectedModCount;  // for fast-fail        int index;             // current slot        HashIterator() {            expectedModCount = modCount;            Node<K,V>[] t = table;            current = next = null;            index = 0;            if (t != null && size > 0) { // advance to first entry                do {} while (index < t.length && (next = t[index++]) == null);            }        }        public final boolean hasNext() {            return next != null;        }        final Node<K,V> nextNode() {            Node<K,V>[] t;            Node<K,V> e = next;            if (modCount != expectedModCount)                throw new ConcurrentModificationException();            if (e == null)                throw new NoSuchElementException();            if ((next = (current = e).next) == null && (t = table) != null) {                do {} while (index < t.length && (next = t[index++]) == null);            }            return e;        }        public final void remove() {            Node<K,V> p = current;            if (p == null)                throw new IllegalStateException();            if (modCount != expectedModCount)                throw new ConcurrentModificationException();            current = null;            K key = p.key;            removeNode(hash(key), key, null, false, false);            expectedModCount = modCount;        }    }

其中，里面的nextNode()方法是核心的迭代逻辑，可以看到，它是通过index的不断递增，同时按序遍历每个index对应的list的。

除了EntrySet，还有KeySet和Value，他们分别提供了key的视图和Value，他们的迭代器的迭代逻辑和EntrySet一样，因为底层都是调用了HashIterator:

    final class KeyIterator extends HashIterator        implements Iterator<K> {        public final K next() { return nextNode().key; }    }

    final class ValueIterator extends HashIterator        implements Iterator<V> {        public final V next() { return nextNode().value; }    }

注意 Value不是通过Set，而是通过Collection来提供视图的，这是因为Value可能有重复，而key不会。

　　还有，不同的调用比如EntrySet.iterator()和KeySet.iterator()返回的迭代器之间没有耦合关系，因为每次都是new 一个迭代器，即都会分别new 一个HashIterator，会重置游标。
　　最后，还注意到，通过HashMap的modCount成员变量来记录对map的结构性的修改，在迭代器中要检查该值在迭代的过程中的变化，如果发现变化就抛出ConcurrentModificationException，这就是所谓的fail-fast。
　　总代码有837行，还是挺长的。

0 0