jdk1.6 的 HashMap 源码分析及1.7,1.8的主要更改

来源：互联网发布：软件测试就业培训编辑：程序博客网时间：2024/05/22 14:23

HashMap源码分析

基于jdk 1.6.0_45

Map

一个Map可以返回keys的Set集合，values的Collection集合，或者key-value pairs的Set集合

equals方法

   public boolean equals(Object obj) {               return (this == obj);}

为了在散列表中将自己的类作为键使用，必须同时覆盖hashCode()和equals()方法。equals()要满足以下5条

自反性，对称性，传递性，

一致性:对于任何非空引用值 x和y，多次调用x.equals(y)始终返回 true或始终返回false，前提是对象上equals比较中所用的信息没有被修改

对于任何非空引用值 x，x.equals(null)都应返回false。

以下是Object类中equals()方法的代码:对于任何非空引用值x和y，当且仅当x和y引用同一个对象时，此方法才返回true

public boolean equals(Object obj) {return (this == obj);}

当此方法被覆写时，通常有必要重写 hashCode方法，以维护 hashCode方法的常规协定，该协定声明相等对象必须具有相等的哈希码。

@Overridepublic boolean equals(Object o) {    if (o == this)         return true;    if (!(o instanceof Complex))        return false;     Complex c = (Complex) o;     // ....}

hashCode方法

public int hashCode()

返回该对象的哈希码值。支持此方法是为了提高哈希表（例如 java.util.Hashtable提供的哈希表）的性能。

hashCode 的常规协定是：

在 Java 应用程序执行期间，在对同一对象多次调用 hashCode 方法时，必须一致地返回相同的整数，前提是将对象进行 equals比较时所用的信息没有被修改。从某一应用程序的一次执行到同一应用程序的另一次执行，该整数无需保持一致。

如果根据 equals(Object)方法，两个对象是相等的，那么对这两个对象中的每个对象调用 hashCode方法都必须生成相同的整数结果。

如果根据 equals(java.lang.Object) 方法，两个对象不相等，那么对这两个对象中的任一对象上调用 hashCode方法不要求一定生成不同的整数结果。但是，程序员应该意识到，为不相等的对象生成不同整数结果可以提高哈希表的性能。

实际上，由 Object类定义的 hashCode方法确实会针对不同的对象返回不同的整数。（这一般是通过将该对象的内部地址转换成一个整数来实现的，但是Java编程语言不需要这种实现技巧。）

属性field

HashMap允许Key是null

transient int size;

该变量保存了该 HashMap中所包含的 key-value对的数量。

transient Entry[] table;int threshold;final float loadFactor;

capacity是table数组的length。threshold是HashMap能容纳的key-value对的最大值，它的值等于HashMap的capacity乘以负载因子（load factor）;当size++ >= threshold时，HashMap会自动调用resize方法扩充HashMap的容量。每扩充一次，HashMap的容量就增大一倍。hashmap是数组和链表的结合体,新建hashmap的时候会初始化一个数组Entry[] table

transient volatile int modCount;

这个hashMap结构上修改的次数，结构上修改是指key-value的数量的修改和rehash(调用resize方法容量增长一倍)用于iterators的快速失败(ConcurrentModificationException)

以下是静态常量

static final int DEFAULT_INITIAL_CAPACITY = 16;static final int MAXIMUM_CAPACITY = 1 << 30;static final float DEFAULT_LOAD_FACTOR = 0.75f;

以下是Entry静态内部类的定义

static class Entry<K,V> implements Map.Entry<K,V> {        final K key;        V value;        final int hash;        Entry<K,V> next;         //…      }

构造方法

public HashMap() {        this.loadFactor = DEFAULT_LOAD_FACTOR;        threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);        table = new Entry[DEFAULT_INITIAL_CAPACITY];        init();    }public HashMap(int initialCapacity) {        this(initialCapacity, DEFAULT_LOAD_FACTOR);    } public HashMap(int initialCapacity, float loadFactor) {        if (initialCapacity < 0)            throw new IllegalArgumentException("Illegal initial capacity: " +                                               initialCapacity);        if (initialCapacity > MAXIMUM_CAPACITY)            initialCapacity = MAXIMUM_CAPACITY;        if (loadFactor <= 0 || Float.isNaN(loadFactor))            throw new IllegalArgumentException("Illegal load factor: " +                                               loadFactor);         // Find a power of 2 >= initialCapacity        //找到大于等于initialCapacity的最小的那个2次幂        int capacity = 1;        while (capacity < initialCapacity)            capacity <<= 1;            this.loadFactor = loadFactor;        threshold = (int)(capacity * loadFactor);        table = new Entry[capacity];        init();    }

//空方法，用于子类的初始化hook

void init() {}

capacity是大于等于initialCapacity的最小的那个2的整数次方

put方法

新加入的放在链头，这样最先加入的便会在链尾;从hashmap中get元素时，首先计算key的hashcode，找到数组中对应的Entry，然后通过key的equals方法在对应位置的链表中找到需要的元素。从这里我们可以想象得到，如果每个位置上的链表只有一个元素，那么hashmap的get效率将是最高的

我们首先想到的就是把hashcode对数组长度取模运算，这样一来，元素的分布相对来说是比较均匀的。但是，"模"运算的消耗还是比较大的，用"按位与"更快

static int indexFor(int h, int length) {    return h & (length-1); }

当 hashmap的数组大小(capacity)是2的某次方大小时，取模运算可以用"按位与"来完成

static int hash(int h) {        h ^= (h >>> 20) ^ (h >>> 12);        return h ^ (h >>> 7) ^ (h >>> 4);} public V put(K key, V value) {          if (key == null)               return putForNullKey(value);          int hash = hash(key.hashCode());          int i = indexFor(hash, table.length);          for (Entry<K,V> e = table[i]; e != null; e = e.next) {                 Object k;                 if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {                      V oldValue = e.value;                      e.value = value;                      e.recordAccess(this);                      return oldValue;                 }            }           modCount++;          addEntry(hash, key, value, i);          return null;      }    //null key固定放在table[0]上private V putForNullKey(V value) {        for (Entry<K,V> e = table[0]; e != null; e = e.next) {            if (e.key == null) {                V oldValue = e.value;                e.value = value;                e.recordAccess(this);                return oldValue;            }        }        modCount++;        addEntry(0, null, value, 0);        return null;}  //空方法，entry的value被put方法override时会调用此方法void recordAccess(HashMap<K,V> m) {} //如果for循环中没有找到，就要在table[i]中新增一个entryvoid addEntry(int hash, K key, V value, int bucketIndex) {Entry<K,V> e = table[bucketIndex];    table[bucketIndex] = new Entry<K,V>(hash, key, value, e);    if (size++ >= threshold)          resize(2 * table.length);}

扩容resize

那么hashmap什么时候进行扩容呢？当hashmap中的元素个数超过当前的threshold即 capacity*loadFactor时，就会进行数组扩容。如果当前的容量是MAXIMUM_CAPACITY，resize方法不会改变table的大小，仅会把threshold设为Integer.MAX_VALUE。一般扩容后threshold增加一倍，capacity增长一倍。loadFactor的默认值为0.75，也就是说，默认情况下，数组大小为16，那么当hashmap中元素个数超过16*0.75=12的时候，就把数组的大小扩展为2*16=32，即扩大一倍，然后重新计算每个元素在数组中的位置，而这是一个非常消耗性能的操作，所以如果我们已经预知hashmap中元素的个数，那么预设元素的个数能够有效的提高hashmap的性能

void resize(int newCapacity) {    Entry[] oldTable = table;    int oldCapacity = oldTable.length;    if (oldCapacity == MAXIMUM_CAPACITY) {           threshold = Integer.MAX_VALUE;          return;   } Entry[] newTable = new Entry[newCapacity];transfer(newTable);table = newTable;threshold = (int)(newCapacity * loadFactor);} void transfer(Entry[] newTable) {        Entry[] src = table;        int newCapacity = newTable.length;        for (int j = 0; j < src.length; j++) {            Entry<K,V> e = src[j];            if (e != null) {                src[j] = null;                do {                    Entry<K,V> next = e.next;                    int i = indexFor(e.hash, newCapacity);                      e.next = newTable[i];                    newTable[i] = e;                    e = next;                } while (e != null);            }        } }

完全遍历原来hashmap的每一个bucket,在每个bucket中遍历每个entry,一个bucket中的所有entry不一定在新hashmap的同一个bucket。同addEntry方法一样，后复制过去的entry在bucket的第一个位置

遍历过程不涉及到object的copy,只是reference的copy

get方法

 public V get(Object key) { if (key == null)    return getForNullKey(); int hash = hash(key.hashCode()); for (Entry<K,V> e = table[indexFor(hash, table.length)];             e != null;             e = e.next) {             Object k;             if (e.hash == hash && ((k = e.key) == key || key.equals(k)))                 return e.value;   } return null;} private V getForNullKey() {    for (Entry<K,V> e = table[0]; e != null; e = e.next) {         if (e.key == null)               return e.value;        }     return null; }

remove

public V remove(Object key) {        Entry<K,V> e = removeEntryForKey(key);        return (e == null ? null : e.value);}final Entry<K,V> removeEntryForKey(Object key) {        int hash = (key == null) ? 0 : hash(key.hashCode());        int i = indexFor(hash, table.length);        Entry<K,V> prev = table[i];        Entry<K,V> e = prev;         while (e != null) {            Entry<K,V> next = e.next;            Object k;            if (e.hash == hash &&                ((k = e.key) == key || (key != null && key.equals(k)))) {                modCount++;                size--;                if (prev == e)                    table[i] = next;                else                    prev.next = next;                e.recordRemoval(this);                return e;            }            prev = e;            e = next;        }         return e;}

clear

public void clear() {         modCount++;         Entry[] tab = table;         for (int i = 0; i < tab.length; i++) tab[i] = null;         size = 0;}

jdk7中的HashMap

 public HashMap(int initialCapacity, float loadFactor) {        if (initialCapacity < 0)            throw new IllegalArgumentException("Illegal initial capacity: " +initialCapacity);        if (initialCapacity > MAXIMUM_CAPACITY)            initialCapacity = MAXIMUM_CAPACITY;        if (loadFactor <= 0 || Float.isNaN(loadFactor))            throw new IllegalArgumentException("Illegal load factor: " +loadFactor);         this.loadFactor = loadFactor;        threshold = initialCapacity;        init();    }

threshold的计算与JDK 1.6中完全不同，它与合约因子无关，而是直接使用了初始大小作为阈值的大小，但是这仅是针对第一次改变大小前，因为在resize函数（改变容量大小的函数，扩充容量便是调用此函数）中，有如下代码：

threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);

也即是说，在改变一次大小后，threshold的值仍然跟负载因子相关，与JDK 1.6中的计算方式相差无几（未讨论容量到达最大值1,073,741,824时的情况）。

而addEntry函数也与JDK 1.6中有所不同，其源码如下：

void addEntry(int hash, K key, V value, int bucketIndex) {        if ((size >= threshold) && (null != table[bucketIndex])) {            resize(2 * table.length);            hash = (null != key) ? hash(key) : 0;            bucketIndex = indexFor(hash, table.length);        }         createEntry(hash, key, value, bucketIndex);    }

从上面的代码可以看出，在JDK 1.6中，判断是否扩充大小是直接判断当前数量是否大于或等于阈值，而JDK 1.7中可以看出，其判断是否要扩充大小除了判断当前数量是否大于等于阈值，同时也必须保证当前数据要插入的桶不能为空

jdk8中的HashMap

JDK 1.8对于HashMap的实现，新增了红黑树的特点，所以其底层实现原理变得不一样

JDK 1.6 当数量大于容量 *负载因子即会扩充容量。

JDK 1.7 初次扩充为：当数量大于容量时扩充；第二次及以后为：当数量大于容量 *负载因子时扩充。

JDK 1.8 初次扩充为：与负载因子无关；第二次及以后为：与负载因子有关。其详细计算过程需要具体详解。

注：以上均未考虑最大容量时的情况。

阅读全文

0 0