dotNet源码解读--HashTable目录扩展的奥秘

来源：互联网发布：帝国cms会员系统编辑：程序博客网时间：2024/05/20 03:04

摘要：为了探索dotnet中hashtable的目录结构及与目录扩展相关的算法，本文通过对相关源码的阅读与分析，得出如下结论，hashtable的目录是由数组组织，目录元素代表一个数据节点，不是数据桶。目录扩展是扩展当前目录长度2倍往1遍历过程中遇到的第一个素数。目录扩展触发条件：装载因子式的触发，同时考虑到“杂乱程度”需要进行重新散列。目录扩展时需要遍历原有目录中所有的元素。查询过程与探测再散列类似。

关键词：dotnet，hashmap，目录扩展方法，目录扩展触发条件

一、目录结构

本文源自：http://blog.csdn.net/daliaojie/article/details/26366795

首先我们看一下该类主要的成员变量

        private bucket[] buckets;        private int count;        private const int InitialSize = 3;        private float loadFactor;        private int loadsize;        private int occupancy;

buckets为目录，使用数组维系的。

再看bucket是什么：

        private struct bucket        {            public object key;            public object val;            public int hash_coll;        }

原来他是一个结构体，值类型。也就是说，hashtable中的目录位置并不是一个数据桶，而是一个键值对。仅仅一个数据节点。

count就是里面已经装载了多少个元素，loadfactor和loadsize分别是装载因子与阀值。occupancy待会儿说。

二、插入操作

1、散列冲突解决方式

 public virtual void Add(object key, object value)        {            this.Insert(key, value, true);        }

我们追踪insert方法：

 private void Insert(object key, object nvalue, bool add)        {            uint num;            uint num2;            if (key == null)            {                throw new ArgumentNullException("key", Environment.GetResourceString("ArgumentNull_Key"));            }            if (this.count >= this.loadsize)            {                this.expand();            }            else if ((this.occupancy > this.loadsize) && (this.count > 100))            {                this.rehash();            }            uint num3 = this.InitHash(key, this.buckets.Length, out num, out num2);            int num4 = 0;            int index = -1;            int num6 = (int) (num % this.buckets.Length);        Label_0071:            if (((index == -1) && (this.buckets[num6].key == this.buckets)) && (this.buckets[num6].hash_coll < 0))            {                index = num6;            }            if ((this.buckets[num6].key == null) || ((this.buckets[num6].key == this.buckets) && ((this.buckets[num6].hash_coll & 0x80000000L) == 0L)))            {                if (index != -1)                {                    num6 = index;                }                Thread.BeginCriticalRegion();                this.isWriterInProgress = true;                this.buckets[num6].val = nvalue;                this.buckets[num6].key = key;                this.buckets[num6].hash_coll |= (int) num3;                this.count++;                this.UpdateVersion();                this.isWriterInProgress = false;                Thread.EndCriticalRegion();            }            else if (((this.buckets[num6].hash_coll & 0x7fffffff) == num3) && this.KeyEquals(this.buckets[num6].key, key))            {                if (add)                {                    throw new ArgumentException(Environment.GetResourceString("Argument_AddingDuplicate__", new object[] { this.buckets[num6].key, key }));                }                Thread.BeginCriticalRegion();                this.isWriterInProgress = true;                this.buckets[num6].val = nvalue;                this.UpdateVersion();                this.isWriterInProgress = false;                Thread.EndCriticalRegion();            }            else            {                if ((index == -1) && (this.buckets[num6].hash_coll >= 0))                {                    this.buckets[num6].hash_coll |= -2147483648;                    this.occupancy++;                }                num6 = (int) ((num6 + num2) % ((ulong) this.buckets.Length));                if (++num4 < this.buckets.Length)                {                    goto Label_0071;                }                if (index == -1)                {                    throw new InvalidOperationException(Environment.GetResourceString("InvalidOperation_HashInsertFailed"));                }                Thread.BeginCriticalRegion();                this.isWriterInProgress = true;                this.buckets[index].val = nvalue;                this.buckets[index].key = key;                this.buckets[index].hash_coll |= (int) num3;                this.count++;                this.UpdateVersion();                this.isWriterInProgress = false;                Thread.EndCriticalRegion();            }        }

插入操作的刚开始，就是判空与判断是否需要扩展和重新散列。扩展（expand）与重新散列（rehash）两个操作的触发条件及操作，我们稍后追踪。

也就是说我们假设目前不需要扩展与重新散列。

插入操作稍后计算了key所对应的目录位置index，如果该位置无数据即可以占用，如果已经被占用，并且key值相等，则默认操作是替换value的值。否则，该目录位置已经

被占用，并且key不相等，那么我们会再选另一个位置来检测是否合适，如果合适，就插入。再次选择位置的方式，不是简单的选取隔壁的位置，而是加上一个数。这样做是为了更快的找到空闲位置，很明显，hash解决冲突的方式的开放地址法。

这里的

occupancy

应该理解为，有多少个元素不在他该在的位置，及key计算出的index不是其所子啊的位置。

2、再散列与扩展方法

再散列触发条件

(this.occupancy > this.loadsize) && (this.count > 100)

也即是说，如果占据错了位置的元素达到这个阀值并且成员装载数目达到100时，才启动再散列

扩展操作的触发条件：

this.count >= this.loadsize

也就是说，负载元素数目达到这个阀值时触发扩展操作，其实还是和负载因子有关系。

我们看他们对应的源码

 private void rehash()        {            this.rehash(this.buckets.Length);        }

 private void expand()        {            int prime = HashHelpers.GetPrime(this.buckets.Length * 2);            this.rehash(prime);        }

我们发现他们都是调用的

 private void rehash(int newsize)        {            this.occupancy = 0;            Hashtable.bucket[] newBuckets = new Hashtable.bucket[newsize];            for (int i = 0; i < this.buckets.Length; i++)            {                Hashtable.bucket bucket = this.buckets[i];                if ((bucket.key != null) && (bucket.key != this.buckets))                {                    this.putEntry(newBuckets, bucket.key, bucket.val, bucket.hash_coll & 0x7fffffff);                }            }            Thread.BeginCriticalRegion();            this.isWriterInProgress = true;            this.buckets = newBuckets;            this.loadsize = (int) (this.loadFactor * newsize);            this.UpdateVersion();            this.isWriterInProgress = false;            Thread.EndCriticalRegion();        }

只是，在散列操作传入的是当前目录的长度，而扩展传入的是，从当前目录长度的2倍往2遍历遇到的第一个素数。他们认为素数散列后冲突概率小。

这里求素数时的策略，参考文章：

我们看这个方法：

将站错位置的元素数目置零。

new一个指定长度的bucket数组。

在老的bucket数组目录中遍历每一个存在的元素。

将他们放到新的目录中。

  private void putEntry(bucket[] newBuckets, object key, object nvalue, int hashcode)        {            uint num = (uint) hashcode;            uint num2 = (uint) (1 + (((num >> 5) + 1) % (newBuckets.Length - 1)));            int index = (int) (num % newBuckets.Length);        Label_0017:            if ((newBuckets[index].key == null) || (newBuckets[index].key == this.buckets))            {                newBuckets[index].val = nvalue;                newBuckets[index].key = key;                newBuckets[index].hash_coll |= hashcode;            }            else            {                if (newBuckets[index].hash_coll >= 0)                {                    newBuckets[index].hash_coll |= -2147483648;                    this.occupancy++;                }                index = (int) ((index + num2) % ((ulong) newBuckets.Length));                goto Label_0017;            }        }

这个操作和插入有些类似，都要做冲突解决的方法。

这里我们知道，目录扩展的方法，是扩展小于2倍当前目录长度的第一个素数的目录。

结尾：

通过本次对dotNet中hashmap源码的分析与解读，得出如下结论，hashtable的目录是由数组组织，目录元素代表一个数据节点，不是数据桶。目录扩展是扩展当前目录长度2倍往1遍历过程中遇到的第一个素数。目录扩展触发条件：装载因子式的触发，同时考虑到“杂乱程度”需要进行重新散列。目录扩展时需要遍历原有目录中所有的元素。查询过程与探测再散列类似。

1 0