Java、Android工程师必备知识（三）：hashMap的实现原理

来源：互联网发布：办公软件合集编辑：程序博客网时间：2024/06/07 17:51

前言

通过前面的两篇我们已经充分了解了什么是哈西算法，以及哈西表对于Java的重要性，今天就实际了解一下HashMap的实现原理，作为最后的实战总结。

顺便强调一句，我这个是android7.0，JDK为1.8，不同版本源码可能不同。

正文

HashMap是我们最常用的类之一，通过键值对保存一些重要的信息，而且提取速度快，针对性强。例如下面的代码:

HashMap<StringD, Student> map = new HashMap<>();// 放入map.put("bj", student);map.put("hb", student1);map.put("sy", student2);// 提取map.get("bj");

我们先从这几个api入手，因为主要是想要看到hash算法的使用，所以可能会删减掉一些内容：

首先是构造方法：

/**     * Constructs an empty <tt>HashMap</tt> with the specified initial     * capacity and load factor.     *     * @param  initialCapacity the initial capacity     * @param  loadFactor      the load factor     * @throws IllegalArgumentException if the initial capacity is negative     *         or the load factor is nonpositive     */    public HashMap(int initialCapacity, float loadFactor) {        if (initialCapacity < 0)            throw new IllegalArgumentException("Illegal initial capacity: " +                                               initialCapacity);        if (initialCapacity > MAXIMUM_CAPACITY) {            initialCapacity = MAXIMUM_CAPACITY;        } else if (initialCapacity < DEFAULT_INITIAL_CAPACITY) {            initialCapacity = DEFAULT_INITIAL_CAPACITY;        }        if (loadFactor <= 0 || Float.isNaN(loadFactor))            throw new IllegalArgumentException("Illegal load factor: " +                                               loadFactor);        // Android-Note: We always use the default load factor of 0.75f.        // This might appear wrong but it's just awkward design. We always call        // inflateTable() when table == EMPTY_TABLE. That method will take "threshold"        // to mean "capacity" and then replace it with the real threshold (i.e, multiplied with        // the load factor).        threshold = initialCapacity;        // 这是一个空方法，如果你要继承hashmap，可以重写这个方法        init();    }

很简单，就是初始化了一些重要的值，initialCapacity 是容量，loadFactor是内部的HashMapEntry数组的长度与最大容量比例，他的长度是initialCapacity * loadFactor。

/**     * The table, resized as necessary. Length MUST Always be a power of two.     */    transient HashMapEntry<K,V>[] table = (HashMapEntry<K,V>[]) EMPTY_TABLE;/**     * Inflates the table.     */    private void inflateTable(int toSize) {        // Find a power of 2 >= toSize        int capacity = roundUpToPowerOf2(toSize);        // Android-changed: Replace usage of Math.min() here because this method is        // called from the <clinit> of runtime, at which point the native libraries        // needed by Float.* might not be loaded.        float thresholdFloat = capacity * loadFactor;        if (thresholdFloat > MAXIMUM_CAPACITY + 1) {            thresholdFloat = MAXIMUM_CAPACITY + 1;        }        threshold = (int) thresholdFloat;        // 这里初始化了HashMapEntry        table = new HashMapEntry[capacity];    }

这个HashMapEntry到底是个什么东西，能放入我们键值对呢？来看一下类的定义：

static class HashMapEntry<K,V> implements Map.Entry<K,V> {        final K key;        V value;        HashMapEntry<K,V> next;        int hash;        /**         * Creates new entry.         */        HashMapEntry(int h, K k, V v, HashMapEntry<K,V> n) {            value = v;            next = n;            key = k;            hash = h;        }        ...}

不重要的方法我都删掉了，从私有属性看有key，有value，有hash，还有一个next，哎呀呀，这不就是一个链表吗，这个链表到底有什么用呢？别着急，我们先看点别的。

ok，这样HashMap的基础我们就已经了解了，下面看put（key，value）方法：

// 注释已经被忽略，因为太长了public V put(K key, V value) {    // 先判断table是否为空，然后去创建，也就是我们刚才看过的初始化HashMapEntry[]的方法        if (table == EMPTY_TABLE) {            inflateTable(threshold);        }        // 如果key为空，会以为null为key设置value，        if (key == null)            return putForNullKey(value);            // 为这个key计算一个hash值        int hash = sun.misc.Hashing.singleWordWangJenkinsHash(key);        // 通过计算，算出一个坐标位置，这个计算就忽略了        int i = indexFor(hash, table.length);        // 找到指定位置的HashMapEntry，循环判断里面是否已经包含了相同的key        for (HashMapEntry<K,V> e = table[i]; e != null; e = e.next) {            Object k;            // 如果hash相等，且包含相同的key，会覆盖对应的value            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {                V oldValue = e.value;                e.value = value;                e.recordAccess(this);                return oldValue;            }        }    // 如果没有相同的key，数量+1        modCount++;        // 添加新的key和value        addEntry(hash, key, value, i);        return null;    }

接下来我们看一下addEntry的过程：

// 添加到entryvoid addEntry(int hash, K key, V value, int bucketIndex) {    // 先租容量的判断，如果超出了size，        if ((size >= threshold) && (null != table[bucketIndex])) {         // 把hashMapEntry的长度*2            resize(2 * table.length);            // 重新计算hash值            hash = (null != key) ? sun.misc.Hashing.singleWordWangJenkinsHash(key) : 0;            // 重新计算索引值            bucketIndex = indexFor(hash, table.length);        }    // 创建新的Entry        createEntry(hash, key, value, bucketIndex);    }// 这是重新设置entry长度的方法void resize(int newCapacity) {        HashMapEntry[] oldTable = table;        int oldCapacity = oldTable.length;        // 如果之前已经是最大值了，现在设置为int的最大值，之后就可以无限扩容        if (oldCapacity == MAXIMUM_CAPACITY) {            threshold = Integer.MAX_VALUE;            return;        }    // 下面就没什么看点了，就是创建新的长度HashMapEntry，然后把老的HashMapEntry复制进去        HashMapEntry[] newTable = new HashMapEntry[newCapacity];        transfer(newTable);        table = newTable;        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);    }// 创建一个HashMapEntryvoid createEntry(int hash, K key, V value, int bucketIndex) {    // 先取出目前table中指定位置的HashMapEntry        HashMapEntry<K,V> e = table[bucketIndex];        // 新创建HashMapEntry，保存我们的key，value，hash，并且指向刚才取出的HashMapEntry，这样就实现了链表        // 覆盖原有位置的HashMapEntry，成为第一个        table[bucketIndex] = new HashMapEntry<>(hash, key, value, e);        // hashmap的size+1        size++;    }

这里就有话要说了，之前我们就有疑问，如果我们不一小心把key和value的hashCode都变成了相同的怎么办？这里就得到了答案：

HashMapEntry指向之前的HashMapEntry，这样就形成了链表，虽然table中只保存了一个HashMapEntry，但是通过next方法，就可以找到所有的hashCode相同的HashMapEntry。

从效率上看，如果hashCode相同的HashMapEntry有太多，这样每一次put，都会增加运算成本（查找，循环，替换，指向等等操作），这就是hashCode不合理导致效率下降的重要环节。

这样，put（）的流程就结束了，我们看到hash值在中间起到了关键的作用：

通过hash值，去指定在table数组中的索引，方便快捷，我们可以预测，之后的get（）方法，一定会用hash值，直接找到对应索引的hashMapEntry。

接下来是get（key）方法：

// get方法public V get(Object key) {    // 先判断key是否为null，直接去除null对应的value        if (key == null)            return getForNullKey();        // 通过key找到对应hashMapEntry        Entry<K,V> entry = getEntry(key);    // 如果hashMapentry为null，说明没有这个键值对    // 如果hashMapentry不为null，返回的对应的value        return null == entry ? null : entry.getValue();    }// 通过key找到指定的hashMapentryfinal Entry<K,V> getEntry(Object key) {    // hashmap的size是0,value肯定是空的        if (size == 0) {            return null;        }    // 计算key的hash值        int hash = (key == null) ? 0 : sun.misc.Hashing.singleWordWangJenkinsHash(key);        // 通过运算hash值和table的长度，就可以直接得到HashMapEntry的索引，跟我们预测的一样        // 这里取出的时候，又进行了for循环，同样会降低效率        for (HashMapEntry<K,V> e = table[indexFor(hash, table.length)];             e != null;             e = e.next) {            Object k;            // 找到相同的key，返回HashMapEntry            if (e.hash == hash &&                ((k = e.key) == key || (key != null && key.equals(k))))                return e;        }    // 没找到，返回null        return null;    }

get（key）比put（）方法要简单的多了，就好像买面包很简单，但是做面包的过程是很难的。

其他的api就不做介绍了，因为大体的原理都是一样的，我们把刚才的东西整理一下：

1、hash值，在这里最重要的作用，就是可以直接得到hashMapEntry的索引，直接提高了查找的效率。

2、如果key和value的hashCode全都相同，他们会保存到同一位置的hashMapEntry数组中，但是他们之间有链表关系，所以不会出现覆盖或丢失的情况。

3、链表的查询依赖于循环判断，如果数量过多，就会大大降低hashMap的效率，例如我们刚才的put和get方法。

总结

到这里hashMap的原理分析就结束了，作为Java/Android工程师，我们又掌握了一个提高app运行效率的重要技能，这也是为什么看源码是不断成长的重要途径的原因之一。

我们也可以自己去定义hash算法，以后需要定义自己的哈希表就不会手忙脚乱了。顺便说一下HashSet，他内部也是hashMap实现有兴趣的朋友可以自己去看看。

ok，今天就到这里了，拜拜~

阅读全文

0 0