Java之集合类（2）

来源：互联网发布：java服务器插件编辑：程序博客网时间：2024/06/03 20:12

原文地址：http://blog.csdn.net/zhangerqing/article/details/8193118

一、HashMap的内部存储结构
Java中数据存储方式最底层的两种结构，一种是数组，另一种就是链表，数组的特点：连续空间，寻址迅速，但是在删除或者添加元素的时候需要有较大幅度的移动，所以查询速度快，增删较慢。而链表正好相反，由于空间不连续，寻址困难，增删元素只需修改指针，所以查询慢、增删快。有没有一种数据结构来综合一下数组和链表，以便发挥他们各自的优势？答案是肯定的！就是：哈希表。哈希表具有较快（常量级）的查询速度，及相对较快的增删速度，所以很适合在海量数据的环境中使用。一般实现哈希表的方法采用“拉链法”，我们可以理解为“链表的数组”，如下图：
这里写图片描述

从上图中，我们可以发现哈希表是由数组+链表组成的，一个长度为16的数组中，每个元素存储的是一个链表的头结点。那么这些元素是按照什么样的规则存储到数组中呢。一般情况是通过hash(key)%len获得，也就是元素的key的哈希值对数组长度取模得到。比如上述哈希表中，12%16=12,28%16=12,108%16=12,140%16=12。所以12、28、108以及140都存储在数组下标为12的位置。它的内部其实是用一个Entity数组来实现的，属性有key、value、next。接下来我会从初始化阶段详细的讲解HashMap的内部结构。

1、初始化
首先来看三个常量：

static final int DEFAULT_INITIAL_CAPACITY = 16;static final int MAXIMUM_CAPACITY = 1 << 30;static final float DEFAULT_LOAD_FACTOR = 0.75f;

先来看个无参构造方法，也是我们最常用的：

public HashMap() {        this.loadFactor = DEFAULT_LOAD_FACTOR;        threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);        table = new Entry[DEFAULT_INITIAL_CAPACITY];        init();    }

loadFactor、threshold的值在此处没有起到作用，此处只需理解table=new Entry[DEFAULT_INITIAL_CAPACITY].说明，默认就是开辟16个大小的空间。另外一个重要的构造方法：

 public HashMap(int initialCapacity, float loadFactor) {        if (initialCapacity < 0)            throw new IllegalArgumentException("Illegal initial capacity: " +                                               initialCapacity);        if (initialCapacity > MAXIMUM_CAPACITY)            initialCapacity = MAXIMUM_CAPACITY;        if (loadFactor <= 0 || Float.isNaN(loadFactor))            throw new IllegalArgumentException("Illegal load factor: " +                                               loadFactor);        // Find a power of 2 >= initialCapacity        int capacity = 1;        while (capacity < initialCapacity)            capacity <<= 1;        this.loadFactor = loadFactor;        threshold = (int)(capacity * loadFactor);        table = new Entry[capacity];        init();    }

重点在

while (capacity < initialCapacity)             capacity <<= 1;

上面，该代码的意思是，实际的开辟的空间要大于传入的第一个参数的值。举个例子：
new HashMap(7,0.8),loadFactor为0.8，capacity为7，通过上述代码后，capacity的值为：8.（1 << 2的结果是4,2 << 2的结果为8）。所以，最终capacity的值为8，最后通过new Entry[capacity]来创建大小为capacity的数组，所以，这种方法最红取决于capacity的大小。

2、put(Object key,Object value)操作
增加一个变量说明：
transient Entry[] table;
当调用put操作时，首先判断key是否为null，如下代码1处：

public V put(K key, V value) {        if (key == null)//1            return putForNullKey(value);        int hash = hash(key.hashCode());        int i = indexFor(hash, table.length);        for (Entry<K,V> e = table[i]; e != null; e = e.next) {            Object k;            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {                V oldValue = e.value;                e.value = value;                e.recordAccess(this);                return oldValue;            }        }        modCount++;        addEntry(hash, key, value, i);        return null;    }

如果key是null，则调用如下代码：

private V putForNullKey(V value) {        for (Entry<K,V> e = table[0]; e != null; e = e.next) {            if (e.key == null) {                V oldValue = e.value;                e.value = value;                e.recordAccess(this);                return oldValue;            }        }        modCount++;        addEntry(0, null, value, 0);        return null;    }

就是说，获取Entry的第一个元素table[0]，并基于第一个元素的next属性开始遍历，直到找到key为null的Entry，将其value设置为新的value值。
如果没有找到key为null的元素，则调用如上述代码的addEntry(0, null, value, 0);增加一个新的entry，代码如下：

 void addEntry(int hash, K key, V value, int bucketIndex) {    Entry<K,V> e = table[bucketIndex];        table[bucketIndex] = new Entry<K,V>(hash, key, value, e);        if (size++ >= threshold)            resize(2 * table.length);    }

Entry的构造函数：

 Entry(int h, K k, V v, Entry<K,V> n) {            value = v;            next = n;            key = k;            hash = h;        }

先获取第一个元素table[bucketIndex],传给e对象，新建一个entry，key为null，value为传入的value值，next为获取的e对象。如果容量大于threshold，容量扩大2倍。
如果key不为null，这也是大多数的情况，重新看一下源码：

public V put(K key, V value) {          if (key == null)              return putForNullKey(value);          int hash = hash(key.hashCode());//---------------2---------------          int i = indexFor(hash, table.length);          for (Entry<K,V> e = table[i]; e != null; e = e.next) {//--------------3-----------              Object k;              if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {                  V oldValue = e.value;                  e.value = value;                  e.recordAccess(this);                  return oldValue;              }          }//-------------------4------------------          modCount++;//----------------5----------          addEntry(hash, key, value, i);-------------6-----------          return null;      }

看源码中2处，首先会进行key.hashCode()操作，获取key的哈希值，hashCode()是Object类的一个方法，为本地方法，内部实现比较复杂，我们
会在后面作单独的关于Java中Native方法的分析中介绍。hash()的源码如下：

static int hash(int h) {          // This function ensures that hashCodes that differ only by          // constant multiples at each bit position have a bounded          // number of collisions (approximately 8 at default load factor).          h ^= (h >>> 20) ^ (h >>> 12);          return h ^ (h >>> 7) ^ (h >>> 4);      }

int i = indexFor(hash, table.length);的意思，相当于int i = hash % Entry[].length;得到i后，就是在Entry数组中的位置。
3-4处的代码只是检查在索引为i的这条链上有没有key重复的，有则替换且返回原值，程序不再去执行5-6处的代码，无则无处理。
上面我们提到过Entry类里面有一个next属性，作用是指向下一个Entry。如，第一个键值对A进来，通过计算其key的hash得到的i=0，记做:table[0] = A，Entry e1 = A。一会后又进来一个键值对B，通过计算其i也等于0，现在怎么办？HashMap会这样做:B.next = A,table[0] = B,如果又进来C,i也等于0,那么C.next = B,table[0] = C；这样我们发现i=0的地方其实存取了A,B,C三个键值对,他们通过next这个属性链接在一起,也就是说数组中存储的是最后插入的元素。
到这里为止，HashMap的大致实现，我们应该已经清楚了。当然HashMap里面也包含一些优化方面的实现，这里也说一下。比如：Entry[]的长度一定后，随着map里面数据的越来越长，这样同一个i的链就会很长，会不会影响性能？HashMap里面设置一个因素（也称为因子），随着map的size越来越大，Entry[]会以一定的规则加长长度。
2、get(Object key)操作
get(Object key)操作时根据键来获取值，如果了解了put操作，get操作容易理解，先来看看源码的实现：

public V get(Object key) {          if (key == null)              return getForNullKey();          int hash = hash(key.hashCode());          for (Entry<K,V> e = table[indexFor(hash, table.length)];               e != null;               e = e.next) {              Object k;              if (e.hash == hash && ((k = e.key) == key || key.equals(k)))//-------------------1----------------                  return e.value;          }          return null;      }

意思就是：1、当key为null时，调用getForNullKey()，源码如下：

private V getForNullKey() {          for (Entry<K,V> e = table[0]; e != null; e = e.next) {              if (e.key == null)                  return e.value;          }          return null;      }

2、当key不为null时，先根据hash函数得到hash值，在更具indexFor()得到i的值，循环遍历链表，如果有：key值等于已存在的key值，则返回其value。如上述get()代码1处判断。
总结下HashMap新增put和获取get操作：

//存储时:  int hash = key.hashCode();  int i = hash % Entry[].length;  table[i].value= value;  //取值时:  int hash = key.hashCode();  int i = hash % Entry[].length;  return table[i].value;

应用：

/**  * 打印在数组中出现n/2以上的元素  * 利用一个HashMap来存放数组元素及出现的次数  * @author erqing  *  */  public class HashMapTest {      public static void main(String[] args) {          int [] a = {2,3,2,2,1,4,2,2,2,7,9,6,2,2,3,1,0};          Map<Integer, Integer> map = new HashMap<Integer,Integer>();          for(int i=0; i<a.length; i++){              if(map.containsKey(a[i])){                  int tmp = map.get(a[i]);                  tmp+=1;                  map.put(a[i], tmp);              }else{                  map.put(a[i], 1);              }          }          Set<Integer> set = map.keySet();//------------1------------          for (Integer s : set) {              if(map.get(s)>=a.length/2){                  System.out.println(s);              }          }//--------------2---------------      }  }

二、HashTable的内部存储结构
HashTable和HashMap采用相同的存储机制，二者的实现基本一致，不同的是：
1、HashMap是非线程安全的，HashTable是线程安全的，内部的方法基本都是synchronized。
2、HashTable不允许有null值的存在。
在HashTable中调用put方法时，如果key为null，直接抛出NullPointerException。其它细微的差别还有，比如初始化Entry数组的大小等等，但基本思想和HashMap一样。

三、HashTable和ConcurrentHashMap的比较
ConcurrentHashMap是线程安全的HashMap的实现。同样是线程安全的类，它与HashTable在同步方面有什么不同呢？
之前我们说，synchronized关键字加锁的原理，其实是对对象加锁，不论你是在方法前加synchronized还是语句块前加，锁住的都是对象整体，但是ConcurrentHashMap的同步机制和这个不同，它不是加synchronized关键字，而是基于lock操作的，这样的目的是保证同步的时候，锁住的不是整个对象。事实上，ConcurrentHashMap可以满足concurrentLevel个线程并发无阻塞的操作集合对象。关于concurrentLevel稍后介绍。
1、构造方法
为了容易理解，我们先从构造函数说起。ConcurrentHashMap是基于一个叫Segment数组的，其实和Entry类似，如下：

 static final int DEFAULT_INITIAL_CAPACITY = 16; static final float DEFAULT_LOAD_FACTOR = 0.75f; static final int DEFAULT_CONCURRENCY_LEVEL = 16; public ConcurrentHashMap() {        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR, DEFAULT_CONCURRENCY_LEVEL);    }

默认传入值16，调用下面的方法：

static final int MAX_SEGMENTS = 1 << 16static final int MAXIMUM_CAPACITY = 1 << 30 public ConcurrentHashMap(int initialCapacity,                             float loadFactor, int concurrencyLevel) {        if (!(loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0)            throw new IllegalArgumentException();        if (concurrencyLevel > MAX_SEGMENTS)            concurrencyLevel = MAX_SEGMENTS;        // Find power-of-two sizes best matching arguments        int sshift = 0;        int ssize = 1;        while (ssize < concurrencyLevel) {            ++sshift;            ssize <<= 1;        }        segmentShift = 32 - sshift;        segmentMask = ssize - 1;        this.segments = Segment.newArray(ssize);        if (initialCapacity > MAXIMUM_CAPACITY)            initialCapacity = MAXIMUM_CAPACITY;        int c = initialCapacity / ssize;        if (c * ssize < initialCapacity)            ++c;        int cap = 1;        while (cap < c)            cap <<= 1;        for (int i = 0; i < this.segments.length; ++i)            this.segments[i] = new Segment<K,V>(cap, loadFactor);    }

你会发现比HashMap的构造函数多一个参数。这三个值分别被初始化为16,0.75,16，经过：

while (ssize < concurrencyLevel) {            ++sshift;            ssize <<= 1;        }

后，ssize 就是我们最终要开辟的数组的size值，当concurrencyLevel为16时，计算出来的ssize 值就是16.通过：
this.segments = Segment.newArray(ssize )后，我们看出了，最终创建的Segment数组的大小为16.最终创建Segment对象时：

this.segments[i] = new Segment<K,V>(cap, loadFactor);

2、put操作

//主体 public V put(K key, V value) {        if (value == null)            throw new NullPointerException();        int hash = hash(key.hashCode());        return segmentFor(hash).put(key, hash, value, false);   }//不同的hash方法private static int hash(int h) {        // Spread bits to regularize both segment and index locations,        // using variant of single-word Wang/Jenkins hash.        h += (h <<  15) ^ 0xffffcd7d;        h ^= (h >>> 10);        h += (h <<   3);        h ^= (h >>>  6);        h += (h <<   2) + (h << 14);        return h ^ (h >>> 16);    }//根据Hash值得到segments即Segment[]对应位置的Segment对象 final Segment<K,V> segmentFor(int hash) {        return segments[(hash >>> segmentShift) & segmentMask];    }//put(key, hash, value, false); V put(K key, int hash, V value, boolean onlyIfAbsent) {            lock();            try {                int c = count;                if (c++ > threshold) // ensure capacity                    rehash();                HashEntry<K,V>[] tab = table;                int index = hash & (tab.length - 1);                HashEntry<K,V> first = tab[index];                HashEntry<K,V> e = first;                while (e != null && (e.hash != hash || !key.equals(e.key)))                    e = e.next;                V oldValue;                if (e != null) {                    oldValue = e.value;                    if (!onlyIfAbsent)                        e.value = value;                }                else {                    oldValue = null;                    ++modCount;                    tab[index] = new HashEntry<K,V>(key, hash, first, value);                    count = c; // write-volatile                }                return oldValue;            } finally {                unlock();            }        }

四、HashMap常见问题分析
1.HashMap中Value可以相同，但是键不可以相同

public class Test {      public static void main(String[] args) {          HashMap<String,Integer> map = new HashMap<String,Integer>();          //出入两个Value相同的值，没有问题          map.put("egg", 1);          map.put("niu", 1);          //插入key相同的值，看返回结果          int egg = (Integer) map.put("egg", 3);          System.out.println(egg);   //输出1          System.out.println(map.get("egg"));   //输出3，将原值1覆盖          System.out.println(map.get("niu"));   //输出1      }  }

相同的键会被覆盖，且返回原值。
2、HashMap按值排序
给定一个数组，求出每个数据出现的次数并按照次数的由大到小排列出来。我们选用HashMap来做，key存储数组元素，值存储出现的次数，最后用Collections的sort方法对HashMap的值进行排序。代码如下：

public class Test {      public static void main(String[] args) {          int data[] = { 2, 5, 2, 3, 5, 2, 3, 5, 2, 3, 5, 2, 3, 5, 2,                  7, 8, 8, 7, 8, 7, 9, 0 };          Map<Integer, Integer> map = new HashMap<Integer, Integer>();          for (int i : data) {              if (map.containsKey(i)) {//判断HashMap里是否存在                  map.put(i, map.get(i) + 1);//已存在，值+1              } else {                  map.put(i, 1);//不存在，新增              }          }          //map按值排序          List<Map.Entry<Integer, Integer>> list = new ArrayList<Map.Entry<Integer, Integer>>(                  map.entrySet());          Collections.sort(list, new Comparator<Map.Entry<Integer, Integer>>() {              public int compare(Map.Entry<Integer, Integer> o1,                      Map.Entry<Integer, Integer> o2) {                  return (o2.getValue() - o1.getValue());              }          });          for (Map.Entry<Integer, Integer> m : list) {              System.out.println(m.getKey() + "-" + m.getValue());          }      }  }

输出：
2-6
5-5
3-4
8-3
7-3
9-1
0-1

阅读全文

0 0