浅谈HashMap

来源:互联网 发布:淘宝手机海报尺寸2017 编辑:程序博客网 时间:2024/06/07 02:18
类结构
public class HashMap<K,V>
extends AbstractMap<K,V>
implements Map<K,V>, Cloneable, Serializable

类说明:
Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key.
哈希表实现于Map接口,这个实现提供了Map接口定义的所有操作,包含允许null作为key和作为value。
(The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.)
HashMap和HashTable很类似,基本的区别就是后者是线程同步和不允许null,前者不是线程同步,允许null

This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
这个类并没有保证是有序的map实现,也不保证元素的排序在一段时间内不会改变

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.
这个实现的基本操作(get和put操作)都有恒定的时间性能,当然这个的前提肯定是集合中的元素hash方法能很好分散元素。
Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings).
迭代遍历集合跟HashMap初始化的capacity参数以及HashMap的大小有直接关系

Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
还有一个很关键的就是不要把初始化容量设置得太大(或把加载因子设置得太小),这些都跟性能有直接的关系

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor.
一个HashMap的实例,有两个关键的参数影响其性能:初始容量capacity和加载因子

The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.
容量就是哈希表中桶的数量,初始容量就是哈希表创建时的初始容量

The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.
加载因子用来计算当哈希表多满的时候进行自动扩容

When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
当当前哈希表中元素的数量超过当前容量和加载因子的乘积时,哈希表就会重新哈希(意思就是内部的数据结构会重新构造),因此哈希表的数量大约是桶的两倍

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs.
一般来说,默认的加载因子(0.75)已经在时间和空间消耗上得到了很好的平衡

Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put).
把加载因子变大,可以提升空间的利用率,但也同时提升了查找的消耗(这个直接反应在HashMap中的get和put操作中)

The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations.
当需要初始化加载因子的设置时,应该考虑到map中预期的元素数目,应当尽量减少重新hash的操作

If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
如果初始容量大于按加载因子除于的最大条目数,那么永远也不会发生重新hash的操作

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table.
如果许多映射将被存储在一个HashMap实例,创建了一个足够大容量的(哈希表比让它执行自动增长所需要的哈希表更有效

Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table.
需要注意的是,如果大部分的key都使用相同的hashCode(),那么可以肯定任何哈希表的性能都不高

To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.
为了改善影响,当键具有可比性时,该类可以使用键与键之间的比较顺序来帮助分散(数据到哈希表上)

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
还有一个需要注意的是,这个实现并没有同步,如果多个线程同时访问哈希表,那么将会一个以上的线程修改表的结构,(这就)需要外部进行同步。

(A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)
结构更改是指所有添加或删除表数据的操作,这并不包含修改这个实例中已经存在key的值这个操作

This is typically accomplished by synchronizing on some object that naturally encapsulates the map.
这通常通过对自然封装映射的对象进行同步实现。

If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method.
如果没有这样的对象,也可以使用类似Collections.synchronizedMap()方法来包裹。
This is best done at creation time, to prevent accidental unsynchronized access to the map:
最好是在创建的时候就进行包裹操作,防止那些没有同步的意外来访问Map。

包裹同步的map例子:
Map m = Collections.synchronizedMap(new HashMap(...));

The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException.
这个类的所有的集合视图方法所返回的迭代器都是fail-fast(快速失败):如果迭代器创建之后,map的结构被改变的话,除了迭代器本身的操作之外的所有操作更改操作,都会导致迭代器抛出ConcurrentModificationException异常

Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
因此,面对并行修改,迭代器干净利落的失败,比在未来不确定的时间内冒出不确定的行为(而导致数据异常或错误)更好。

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification.
注意,迭代器的快速失败行为的产生并不能保证一定会发生,(那么)一般来说,不可能也难以做出保证,在不同步的情况下可以进行并发修改。

Fail-fast iterators throw ConcurrentModificationException on a best-effort basis.
(不管怎样)快速失败的迭代器只是尽最大的努力抛出ConcurrentModificationException异常(来检查)

Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
因此,写出一个依赖这个异常的程序本身就是错误的,迭代器的这个快速失败行为应当只用于检测错误.

构造方法:
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
public HashMap( int initialCapacity ){...}

public HashMap(int initialCapacity, float loadFactor){...}
//以上三个方法是HashMap的主要构造方法,容量和加载因子都可以自定义

常规方法分析
HashMap的高效体现在数据的存在(put)和获取(get)的操作上,因此,主要研究这两个方法就可以大概看出HashMap的设计精髓

现在分析添加数据方法:
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
//原来的源码是三目运算,这里为了方便加注释,修改了源代码
static final int hash(Object key) {
int h=key.hashCode();
//对key的hasCode无符号右移16位,对于无符号右移请看后面的注释
int m=h >>> 16;
return (key == null) ? 0 : h^m;
}

//
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
//当数据表为空的时候就初始化,关键的方法是resize()
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
//数据表中对应的位置不存在数据,则添加一个新节点
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
//数据表中对应的位置存在数据
else {
Node<K,V> e; K k;
//判断节点是否相同,hash和key相同才是同一个内容,
//从这里可以看出,判断节点是否相同依靠的是key的hash和equals()方法
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)//表中对应位置的节点为树形节点
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
//链表节点
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {//链表的末端
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);//对当前位置的数据处理,如果数据表小,则扩容,如果数据表大,则更新
break;
}
//如果链表中存在对应的数据
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
//可以选择替换或不替换已存在的数据
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}

//数据表扩容
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
//如果当期数据大于最大容量,不再扩容
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}//小于最大容量,大于默认容量
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else {// zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
//初始化
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if (oldTab != null) {
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
//在指定位置添加节点
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order,维持原来的顺序把数据放回去
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}

//
final void treeifyBin(Node<K,V>[] tab, int hash) {
int n, index; Node<K,V> e;
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
resize();
else if ((e = tab[index = (n - 1) & hash]) != null) {
TreeNode<K,V> hd = null, tl = null;
do {
TreeNode<K,V> p = replacementTreeNode(e, null);
if (tl == null)
hd = p;
else {
p.prev = tl;
tl.next = p;
}
tl = p;
} while ((e = e.next) != null);
if ((tab[index] = hd) != null)
hd.treeify(tab);
}
}
//
Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
return new Node<>(hash, key, value, next);
}


对于:>>> 的说明

无符号右移,忽略符号位,空位都以0补齐
value >>> num -- num 指定要移位值value 移动的位数。
无符号右移的规则只记住一点:忽略了符号位扩展,0补最高位

对于:^ -- 抑或
规则是对于二进制的位来说,相同为0,不同为1
例如:
a: 10101110110111001011101000010
b: 00000000000000001010111011011
a^b:10101110110111000001010011001
其中b的二进制的是1010111011011,前面的0都是不显示的,这里只是为了更明显的看出结果




原创粉丝点击