JAVA8 HashMap 的原理--个人

来源：互联网发布：协同过滤推荐php 编辑：程序博客网时间：2024/06/05 09:15

最近有时间，看了下JAVA的HashMap源码，又在网上搜了些博客，现在将自己对HashMap的一些个人理解记录一下；

HashMap，基于哈希表的 Map 接口的实现，以key-value的形式存在。在HashMap中，key-value总是会当做一个整体来处理，系统会根据hash算法来来计算key-value的存储位置，我们总是可以通过key快速地存、取value。它的原理，实际上我们从名字就可以看出一部分，Hash + Map。即用Hash值来映射地址，实现快速的存取。

在现实生活中，如果想快速的找到某一样事物的话，通常是通过一个地址，或者一个名字来查找它代表的事物。最好这个名字或地址是独一无二的，可以直接定位。HashMap的原理就是如此。通过对key的哈希运算，映射成散列表中位置（地址），在这个位置存储。查找的时候，就可以快速定位，直接查找了。

按照以上原理，假如，想自己实现一个Map，不叫HashMap了，叫做IntMap，只能存储int类型。首先，确定这个Map的基本数据结构，使用常用的数组 array，存的最大的数字 max 是多少，这个数组的长度 length 就是 length = max+1 。然后数据的映射地址，定义一个简单的算法，index = number %7 + 1,即模7之后加1。即数据 1 就保存在 array[2]，数据 2 保存在 array[3]，画张图，来示意一下。

key是这个数据本身，当我们想获取数据 5 时。使用 key = 5，代码中按照上述的算法进行运算。得出 key = 5 的索引地址是 6 ，直接去取 array[6]，并返回就可以了。感觉很简单？对，原理就是这么简单，但真正的要实现起来就要复杂好多了。

可以记录一下容易想到的问题，然后，带着问题到HashMap的源码里，看看是怎么解决的；

1.当遇到两个不同的键有同样的索引地址怎么做？1%7+1 = 2，8%7+1 = 2，这时，应该怎么存储？

2.浪费的存储空间太多了，如果上图中的最大值如果不是7，而是7000，那么浪费的空间会有很多。

3.数组是定长的，当容量超出时，怎么做？

4.如何保证线程安全的问题？

一：对于1，这个问题，有一个专业的称呼，叫做哈希碰撞。避免哈希碰撞的方法，网上有很多的介绍。可以搜一下。在HashMap中，采用的方法是链地址法。所以，HashMap的存储结构实际上如下图：

在相同地址时，在同样地址采用链表存储相同Hash的数据；在JAVA8中，改进了方法。当链表长度不大于8时，采用链表，大于8时，采用红黑树；采用一个常量来控制，

/** * The bin count threshold for using a tree rather than list for a * bin.  Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. */static final int TREEIFY_THRESHOLD = 8;

对于2，3这两个问题，可以看一下HashMap中，存储数据的 put(K, V)方法；

/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. * * @param key key with which the specified value is to be associated * @param value value to be associated with the specified key * @return the previous value associated with <tt>key</tt>, or *         <tt>null</tt> if there was no mapping for <tt>key</tt>. *         (A <tt>null</tt> return can also indicate that the map *         previously associated <tt>null</tt> with <tt>key</tt>.) */public V put(K key, V value) {    return putVal(hash(key), key, value, false, true);}

可以看到，使用hash(Onject key)计算hash值后，调用了putVal方法，先看hash值的计算方法；

/** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower.  Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.)  So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. */static final int hash(Object key) {    int h;    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);}

注释很长很全，英语不好，不翻译了；从代码可以看到，当key为null时，直接返回 0；否则返回对象的哈希码异或哈希码无符号右移16位；在putVal()方法中，对于2，3两个问题有解决的方案。

putVal 方法的源码如下，加了些个人的注释：

 
/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //确认有没有初始化数组Node<K,V>[] table
    //这个数组就是HashMap的基础存储数据的容器，Node<K,V> 是HashMap中的一个静态内部类，类似于链表节点的
    //数据结构，用以保存key-value形式的数据形式；
    if ((tab = table) == null || (n = tab.length) == 0) 
    //当数组Node<K,V>[] table没有初始化时，调用resize()方法初始化；
        n = (tab = resize()).length;
    //依据hash值计算数据的存储地址；算法是(n - 1) & hash，当这个地址没有数据时，调用newNode()直接存储数据；
    if ((p = tab[i = (n - 1) & hash]) == null) 
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        //如果哈希值相同，并且key值也相同，则视作key相同，替换原先老数据；
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        //如果不同的键有相同的哈希值，先检测当前数据是树节点还是链表节点，因为java8采用链地
        //址法，有两种数据存储方式解决哈希冲突，一种是链表，一种是红黑树；
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
        //循环整个链表
            for (int binCount = 0; ; ++binCount) {
            //如果当前节点的next值为null，用当前节点的next节点存储键值对数据
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    //如果当前是链表的存储方式，在此检测链表的节点长度是否大于等于7，
                    //即 TREEIFY_THRESHOLD - 1;是的话，转为红黑树存储；
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                //如果当前节点的next节点等于需要存储的键值对数据，则替换
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        //返回新增的value
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    //如果当前数组的容量超出阀值，则重新生成数组
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

可以看到，put()方法的逻辑还是比较好理解的，难以理解的是其中采用的算法。例如hash的计算方法，地址的计算方法，对链表与红黑树的处理，数组扩容的处理等，需要更进一步的理解。

二：第4个问题，HashMap的源码中，可以看到并没有对它进行线程安全的处理，换句话说，HashMap不是线程安全的。所以，应当避免在多线程环境中使用HashMap。如果真的有需要Map的地方，请使用ConcurrentHashMap代替；

阅读全文

0 0