深入哈希表（二）--开放定址法实现哈希表

来源：互联网发布：电脑内部录音软件编辑：程序博客网时间：2024/06/05 05:20

在上一篇文章中，我们主要是简介了哈希表，下文我们来实现开放定址法的哈希表。
开放定址法主要有两种方法，一种是线性探测，一种是二次探测，本文主要针对这两种方法进行介绍。

开放定址法–闭散列方法

一、线性探测

（1）线性探测是什么

设给出一组元素，它们的关键码为：37,25,14,36,49,68,57,11，散列表为HT[12]，表的大小m = 12，假设采用Hash(x) = x % p; // (p = 11) 11是接近m的质数，就有：

Hash(37) = 4、Hash(25) = 3 、Hash(14) = 3、Hash(36) = 3、Hash(49) = 5 、Hash(68) = 2、 Hash(57) = 2、Hash(11) = 0

采用线性探查法处理冲突

这里写图片描述

如果需要加入一个元素时，使用散列函数进行计算，确定元素的桶号H0，按此桶号查看该桶，如果是所要搜索的元素的关键码，则说明表中已有此元素，不再进行此元素的插入，否则即为冲突，再查看紧随其后的下一个桶，如果是空桶，则搜索失败，新元素插入即可。

所以线性探测会导致一片位置冲突，在一片范围内冲突越来越多，从而导致堆积问题。

（2）线性探测的实现：

①底层

实现方法用vector作为容器，但是删除元素会变得很麻烦，这时不敢说遇到空位置就结束，因为此时查找元素是完全查找，所以我们可以给每个位置都有一个状态，状态有一为空，二存在，三删除，遇到删除不停止，遇到空可以停止。

定义为模板写成hashnode，枚举出三个状态，包含了状态和key、value，
保持size和capcity一样大。

②扩容

线性探测的扩容：
负载因子可以看作数据的占用率，一定是小于1，但是不可能为1，所以我们需要考虑的是控制负载因子，用容量除以hash的size，可以把上面的数据乘0，大于7（也可以强制转化），这时需要扩容。

但是hash的增容代价很大，不能像vector一样构造复制删除，要重新哈希，映射到新的空间中。

方法：定义新的hashtable 表，resize新的空间（获取质数表中的元素），循环对每个成员调用插入函数（此时不会增容，因为前面已经resize过了），最后转换其中的tables即可。

③插入

首先映射出index位置，此时可以包装成另外一个函数。
用另外一个函数，初步是用key%_tables的size，但是我们发现无法对string或者结构体进行取模。

我们的解决方法如下：

方法一：字符串哈希算法–字符串转换为整形，
方法二：可以写一个仿函数，全部强传为int，返回size_t。缺省作为参数，在函数体内部定义一个仿函数对象，传入key即可，但是需要对string类型进行单独处理，单独写string的仿函数返回字符串哈希算法，最后测试时候显示传string的模板参数即可。如果是结构体尽量去选其中的一项作为可以取模的参数。
方法三：采用特化
用循环找一个位置，删除标记和空的时候可插入，如果值相等则不插入，判断如果出范围，定位为0。

④查找

查找和插入内部实现类似。
遇到空停止，判断如果存在并且状态为存在则找到，状态不是存在则没有找到，没有找到则向后移动，最后判断是否到底部，index再次置为0。

⑤删除

首先调用查找函数，查找状态为存在则删除（注意此时采用的是置状态为删除，这便是伪删除），–size，没有找到返回失败。

⑥hashfunc实现

主要采用仿函数，通常能变成整数的元素采用最普通的版本，本文针对于我们用的最多的string进行了一个特化版本。

不用写析构函数会自己析构。
拷贝构造也是vector内部做的。

#include <vector>#include <string>using namespace std;namespace HASHTABLE{////实现开放定址法的哈希表//三个状态，一为空二为存在三为删除    enum State    {        EMPTY = 1,        EXITS = 2,        DELETE = 3,    };    template <class K, class V>    struct HashNode    {        K _key;        V _value;        State _state;        HashNode()            :_key(_key)            ,_value(value)            ,_state(EMPTY)        {}    };    template <class K>    struct _HashFunc    {        size_t operator()(const K& key)        {            return key%_t;        }    };    //sting的单独版本    template <>    struct _HashFunc<string> //特化string类型的仿函数    {        static size_t BKDRHash(const char * str)        {            unsigned int seed = 131; // 31 131 1313 13131 131313            unsigned int hash = 0;            while (*str)            {                hash = hash * seed + (*str++);            }            return (hash & 0x7FFFFFFF);        }        size_t operator()(const string& key)        {            return BKDRHash(key.c_str());        }    };    template <class K, class V,class HashFunc=_HashFunc<K> >    class HashTable    {        typedef HashNode<K, V> Node;    public:        HashTable()            :_size(0)        {            _tables.resize(_GetPrime(0));        }        HashTable(const HashTable<K, V, HashFunc>& Hash)        {            int NewSize = Hash._tables.size();            _tables.size(NewSize);            for (size_t i = 0; i < _tables.size(); ++i)            {                if (Hash._tables[i]._state == EXITS)                {                    _tables[i]._key = Hash._tables[i]._key;                    _tables[i]._value = Hash._tables[i]._value;                    _tables[i]._state = EXITS;                }            }        }        HashTable<K, V, HashFunc>& operator=(const HashTable<K, V, HashFunc>& Hash)        {            if (*this != Hash)            {                HashTable<K, V, HashFunc> h(Hash);                Swap(h);            }            return *this;        }        bool Insert(const K& key, const V& value)        {            _checkcapcity();            int index = _HashFunc(key);            if (index >= 0)            {                while (_table[index]._state==EXITS)//找到标记不是存在的位置                {                    index++;                    if (index == _tables.size())                        index == 0;                }                //此处表示找到了不是存在的位置，则插入                _tables[index]._key = key;                _tables[index]._value = value;                _tables[index]._state = EXITS;                _size++;                return true;            }            return false;        }        Node* Find(const K& key)        {            int index = _HashFunc(key);            if (index >= 0)            {                int start = index;                while (_tables[index]._state!=EMPTY)                {                    if (_tables[index]._key = key)                    {                        if (_tables[index]._state == EXITS)                            return &_tables[index];                        return NULL;                    }                    ++index;                    if (index == _tables.size())                        index = 0;                    //如果此时表中都不为空，但是有标记删除的元素，此时是找不到的                    if (index == start)                        break;                }            }            return NULL;        }        bool Remove(const K& key)        {            Node cur = Find(K);            if (cur == NULL)                return false;            cur->_state = DELETE;            --_size;            return true;        }    protected:        int _HashFunc(const K& key)        {            if (_tables.size())            {                return HashFunc()(key) % _tables.size();            }            return -1;        }        void _checkcapcity()        {            //负载因子判别，上面乘10可以大于7方便，此时需要扩容            if (_size * 10 / _table._size() > 7 || __tables.size() == 0)            {                HashTable<K, V, HashFunc> hash;                size_t NewSize = _GetPrime(_tables.size());                hash._tables.resize(NewSize);                for (size_t i = 0; i < _tables.size(); i++)                {                    if (_tables[i]._state == EXITS)                    {                        hash.Insert(_tables[i]._key, _tables[i]._value);                    }                }                Swap(hash);            }        }        static unsigned _GetPrime(const unsigned long size)        {            const int _PrimeSize = 28;            static const unsigned long _PrimeList[_PrimeSize] =            {                53ul, 97ul, 193ul, 389ul, 769ul,                1543ul, 3079ul, 6151ul, 12289ul, 24593ul,                49157ul, 98317ul, 196613ul, 393241ul,                786433ul,                1572869ul, 3145739ul, 6291469ul, 12582917ul,                25165843ul,                50331653ul, 100663319ul, 201326611ul, 402653189ul,                805306457ul,                1610612741ul, 3221225473ul, 4294967291ul            };            //以下查找比当前数还要大的素数            for (size_t i = 0; i <_PrimeSize; i++)                  {                if (size < _PrimeList[i])                    return _PrimeList[i];            }            //没找到就返回最后一个数            return _PrimeList[_PrimeSize - 1];          }        Node* Swap(HashTable<K, V, HashFunc>& table)        {            if (*this != table)            {                _table.swap(table._tables);                swap(_size, table._size);            }        }    private:        vector<Node>  _tables;        size_t _size;    };}

二、二次探测

（1）二次探测是什么：

由于线性探测的冲突元素太多，所以为了缓解冲突，我们便采用了二次探测。二次探测是加的i的平方。
但是还是会有许多冲突，所以又有人提出了新的探测方法，双散列方法，具体是左右摇摆走，但是本文不再介绍。

（2）二次探测的实现：

二次探测的实验和线性探测基本相同，唯一不同的是插入的index加i的平方，保存first,(用first+i的平方)%size，最后再加++i，既是二次探测。

阅读全文

0 0