HashMap源码阅读笔记

来源：互联网发布：英雄传奇刷点卷软件编辑：程序博客网时间：2024/05/18 18:16

4. HashMap

参考文章HashMap源码解析
底层通过数组和链表实现，是一个存储着链表的数组，通过链表法来解决冲突。HashMap不是线程安全的。它的方法都不是同步方法。可以通过Collections.synchronizedMap(new HashMap())获得一个线程安全的HashMap。一个可能的多线程环境下的问题就是在多个线程对它进行扩容时可能会形成死循环链表。

    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16    static final int MAXIMUM_CAPACITY = 1 << 30;    static final float DEFAULT_LOAD_FACTOR = 0.75f;    transient Node<K,V>[] table; //大小为2的倍数

默认初始大小16，默认加载因子为0.75f。可以接受null键，并存放在数组的第一个位置，threshold = capacity * loadFactor, 当HashMap中存储元素大于threshold时会进行扩容并将大小扩为原来的两倍。定位k键时，首先根据计算的hash值定位到它应当所在的链表即找到它在Entry[]数组中的位置i->Entry[i],然后遍历这个单链表通过==或equals完成匹配，若是put操作在onlyIfAbsent为false的情况下会覆盖并返回旧值。
可以在Object类中找到hashCode()方法的定义，
Object.hashCode():

public native int hashCode();

native关键字说明其修饰的方法是一个原生态方法，方法对应的实现不是在当前文件，而是在用其他语言（如C和C++）实现的文件中。Java语言本身不能对操作系统底层进行访问和操作，但是可以通过JNI接口调用其他语言来实现对底层的访问。hashCode的存在主要是用于查找的快捷性，如Hashtable，HashMap等，hashCode是用来在散列存储结构中确定对象的存储地址的。两个对象的hashCode相同，并不一定表示两个对象就相同，也就是不一定适用于equals(java.lang.Object) 方法，只能够说明这两个对象在散列存储结构中，如Hashtable，他们“存放在同一个篮子里”。理解HashMap的存取需要理解散列方法，hashCode()和equals()的作用。hashCode用于对象在散列存储结构中的散列位置，equals用于比较两个对象是否相同（默认Object的equals方法比较对象的引用是否相同，即是否在同一个内存位置）。这里又可以引申到equals()和”==”的区别。”==”用于比较两个值是否相同，基本类型只能使用”==”比较

hash()方法:

    static final int hash(Object key) {        int h;        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);  //^按位异或 >>>无符号右移，空位补0    }

get方法:

返回key对应映射的值，不存在则返回null，首先算出key的hash值。传入getNode方法来进行来完成在Node中对key的查找（通过hash值与长度的运算找出在数组中的索引位置），可以看出元素是存放在tab[(length - 1) & hash]位置的Entry(单链表)中，则散列方法为 hash & length -1。 实际上跟除留取余的效果一样，但&操作效率比%操作效率更高。

 public V get(Object key) {        Node<K,V> e;        return (e = getNode(hash(key), key)) == null ? null : e.value;    }    /**     * Implements Map.get and related methods     *     * @param hash hash for key     * @param key the key     * @return the node, or null if none     */    final Node<K,V> getNode(int hash, Object key) {        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;        //找出此散列值在数组中的位置，即链表的头节点，可以看出散列位置为length - 1 & hash        if ((tab = table) != null && (n = tab.length) > 0 &&            (first = tab[(n - 1) & hash]) != null) {            if (first.hash == hash && // always check first node                ((k = first.key) == key || (key != null && key.equals(k))))                return first;            if ((e = first.next) != null) {                if (first instanceof TreeNode)                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);                do {                    if (e.hash == hash &&                        ((k = e.key) == key || (key != null && key.equals(k))))                        return e;                } while ((e = e.next) != null);            }        }        return null;    }

可以看出在新的实现中，HashMap在单个桶内元素过多时会将在使用TreeNode结构(jdk1.8中是红黑树结构)。并保留使用Node单链表结构解决冲突。

put方法：

计算出对应的hash值，传入putVal()方法，若对应映射已经存在，覆盖旧值并返回旧值。

/**     * Associates the specified value with the specified key in this map.     * If the map previously contained a mapping for the key, the old     * value is replaced.     *     * @param key key with which the specified value is to be associated     * @param value value to be associated with the specified key     * @return the previous value associated with <tt>key</tt>, or     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.     *         (A <tt>null</tt> return can also indicate that the map     *         previously associated <tt>null</tt> with <tt>key</tt>.)     */    public V put(K key, V value) {        return putVal(hash(key), key, value, false, true);    }    /**     * Implements Map.put and related methods     *     * @param hash hash for key     * @param key the key     * @param value the value to put     * @param onlyIfAbsent if true, don't change existing value     * @param evict if false, the table is in creation mode.     * @return previous value, or null if none     */    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,                   boolean evict) {        Node<K,V>[] tab; Node<K,V> p; int n, i; //p定位到对应散列桶位置的头节点        if ((tab = table) == null || (n = tab.length) == 0)            n = (tab = resize()).length;            //若对应散列桶位置为空，创建新的节点        if ((p = tab[i = (n - 1) & hash]) == null)            tab[i] = newNode(hash, key, value, null);        else {     /*先判断是否和头节点相同：1.是否在同一个散列位置:(p.hash == hash),                           2.是否为相同的key:(k =p.key) == key || (key !=null & key.equals(k))*/            Node<K,V> e; K k;            if (p.hash == hash &&                ((k = p.key) == key || (key != null && key.equals(k))))                e = p;                //若p是树结构节点，调用putTreeVal（）方法来完成put。            else if (p instanceof TreeNode)                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);            else {/*查找p作为头节点的单链表，若p的next为空，则建立新的节点作为p.next，此时e指向p.next，binCount计算该链表的长度，当链表长度 >= TREEIFY_THRESHOLD -1时，则将该散列桶位置中的单单链表结构改为树结构。*/                for (int binCount = 0; ; ++binCount) {                    if ((e = p.next) == null) {                        p.next = newNode(hash, key, value, null);                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st                            treeifyBin(tab, hash);                        break;                    }                    /*1.完成匹配，跳出循环：key值在此位置已经有对应映射，当前在e指向的位置，                      2.或到了链表尾部，回到循环第一行满足p.next == null,创立新节点在链表尾部并跳出循环。*/                    if (e.hash == hash &&                        ((k = e.key) == key || (key != null && key.equals(k))))                        break;                    p = e;                }            }            //获得匹配或遍历完成该桶的链表或树结构之后，若有映射，覆盖旧值并返回旧值，若为新插入的节点则返回新插入值            if (e != null) { // existing mapping for key                V oldValue = e.value;                if (!onlyIfAbsent || oldValue == null)                    e.value = value;                afterNodeAccess(e);                return oldValue;            }        }        ++modCount;        //扩容条件，++size > threshold 即插入后size > threshold        if (++size > threshold)            resize();        afterNodeInsertion(evict);        return null;    }

resize():扩容方法

从上面put方法可以看出，当调用put方法时，若++size > threshold 则会调用resize()方法。此方法主要作用在于扩容，也能对空表进行处理。resize后原本在一个索引位置(桶)中的元素要么在相同的索引位置，要么移动了2的某个幂的位置（说明扩容后原来冲突的元素依旧冲突，这很自然，因为冲突其实是由相同hash值决定的，虽然改变了length，但原本冲突的元素的hash值依然一样所以依然冲突。），resize在多线程环境下可能会形成死循环链接。

    /**     * Initializes or doubles table size.  If null, allocates in     * accord with initial capacity target held in field threshold.     * Otherwise, because we are using power-of-two expansion, the     * elements from each bin must either stay at same index, or move     * with a power of two offset in the new table.     *     * @return the table     */    final Node<K,V>[] resize() {        Node<K,V>[] oldTab = table;        int oldCap = (oldTab == null) ? 0 : oldTab.length;        int oldThr = threshold;        int newCap, newThr = 0;        //当oldTab为null时oldCap为0，若旧表不为空表则扩容为原来的两倍        if (oldCap > 0) {            if (oldCap >= MAXIMUM_CAPACITY) {                threshold = Integer.MAX_VALUE;                return oldTab;            }            //新表容量为旧表的两倍(newCap = oldCap << 1)，Threshold对应也为两倍            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&                     oldCap >= DEFAULT_INITIAL_CAPACITY)                newThr = oldThr << 1; // double threshold        }        //若旧表为空表且(oldThr = threshold >0)，新表容量扩容为threshold(newCap = oldThr)        else if (oldThr > 0) // initial capacity was placed in threshold            newCap = oldThr;        //旧表为空表，且oldThr == 0,利用默认参数建立一个新的表        else {               // zero initial threshold signifies using defaults            newCap = DEFAULT_INITIAL_CAPACITY;            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);        }        if (newThr == 0) {            float ft = (float)newCap * loadFactor;            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?                      (int)ft : Integer.MAX_VALUE);        }        threshold = newThr;        //关键代码， 新建立一个Node<K, V>数组，大小为newCap，将旧表中中的元素放入到新表中，并旧表元素置为null来释放空间。需要重新散列(数组length发生了变化)        @SuppressWarnings({"rawtypes","unchecked"})            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];        table = newTab;        //将旧表元素转移到新表        if (oldTab != null) {            for (int j = 0; j < oldCap; ++j) {                Node<K,V> e;                //若旧表对应桶位置不为空                if ((e = oldTab[j]) != null) {                    //释放旧表对应位置空间，等待回收                    oldTab[j] = null;                    if (e.next == null)                       //放入对应桶位，注意e.hash & length - 1的length已经变为newCap从而完成重新散列。                        newTab[e.hash & (newCap - 1)] = e;                    else if (e instanceof TreeNode)                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);                        //若对应桶位置是一个单链表，头节点为e指向的节点                    else { // preserve order，lo:low;hi:high ???????????????????作用                        Node<K,V> loHead = null, loTail = null;                        Node<K,V> hiHead = null, hiTail = null;                        Node<K,V> next;                        do {                            next = e.next;                            // int oldCap = (oldTab == null) ? 0 : oldTab.length;                            //????判断条件代表什么。OldCap是原表的长度，所以必然是2的某次方幂                            /*按照e.hash对应oldCap的非零位(只有一个1)将原有位置的元素分为两个链表:对应位为0->loHead...loTail，对应位为1->hiHead...hiTail.在新表中的位置loHead对应的链表在同样的索引位置，而hiHead对应的链表则相应的移动了oldCap个位置。*/                            if ((e.hash & oldCap) == 0) {                                if (loTail == null)                                    loHead = e;                                else                                    loTail.next = e;                                loTail = e;                            }                            else {                                if (hiTail == null)                                    hiHead = e;                                else                                    hiTail.next = e;                                hiTail = e;                            }                        } while ((e = next) != null);                        if (loTail != null) {                            loTail.next = null;                            newTab[j] = loHead; //j为在旧表中的索引位置，扩容后依然在相同的索引位置                        }                        if (hiTail != null) {                            hiTail.next = null;                            newTab[j + oldCap] = hiHead; //扩容后移动了oldCap个位置                        }                    }                }            }        }        return newTab;    }

以原来容量为16举例，则扩容为32，oldCap为16，对应二进制为10000，元素e的索引位置应当为e.hash = e.hashCode() & length-1,即e.hashCode() & 01111,而扩容后为32位，元素的索引位置为e.hash = e.hashCode & newLength - 1 即e.hashCode() & 11111。则扩容时根据高位将此索引位置的冲突链表分为两个链表:e.hash & oldCap == 0 ?? 在此例中则为 e.hashCode & 10000 == 0？可以看出只在原来容量的非零位有区别，假设原来对应首位为0，如xxxxx0xxxx & 10000 == 0，则在新的hashMap中它的索引位置依然没有变化，若原来对应位为1,如xxxxxx1xxxx & 10000 !=0，显然只能等于10000,则在新的hashMap中它的索引位置将后移10000位，即移动原来容量大小个位置。

0 0