STLport::hash_map实现分析

来源：互联网发布：淘宝免费视频代码编辑：程序博客网时间：2024/05/21 10:21

前言：

最近研究了下STLport::hash_map的底层实现。由于STLport是基于SGI实现的，所以也可以通过STLport大致一窥SGI的内部逻辑。

STLport下载地址如下：

http://nchc.dl.sourceforge.net/project/stlport/STLport/STLport-5.2.1/STLport-5.2.1.tar.bz2

叙个大概，几个层次：

以最简单的插入数据来讲，从顶向下，主要stlp_std::hash_map，stlp_std::hashtable和stlp_std::slist三层：

stlp_std::hash_map包装了底层实现，向上层提供服务，stlp_std::hashtable提供对任何有名项的存取操作和删除操作，stlp_std::slist负责对最底层的数据的保存和链表的操作。首先准备一个简单的stlp_std::hash_map应用：

#include <iostream>#include <hash_map>typedef stlp_std::hash_map<char , std::string> HASHMAP;typedef HASHMAP::iterator HASHMAP_ITER;int main(int argc ,char *argv[]){HASHMAP hsmp;hsmp['a'] = "yoshino";hsmp['b'] = "akaki";hsmp['b'] = "namiwa";hsmp['c'] = "akitsushima";hsmp['d'] = "iwashima";hsmp['e'] = "hashitachi";hsmp['f'] = "matsushima";return 0;}

这Code没什么特别的，就是向hsmp添加键值。hsmp也是简单至极，只有<key , Value>

stlp_std::hash_map：

我们在声明hsmp时没有指定hash函数和比较函数，所以stlp_std::hash_map的构造中会默认指定（注意，这里我的键是普通类型char，如果是复合类型如结构或是类的话，就必须自定义指定hash函数和比较函数）：

hash_map() : _M_ht(0, hasher(), key_equal(), allocator_type()) {}

这里指定的hash函数hasher()和键比较函数key_equal()，都是仿函数，分别定义在：

//_hash_fun.h_STLP_TEMPLATE_NULL struct hash<char> {size_t operator()(char __x) const { return __x; }//返回char的ASCII};

//_function_base.hstruct equal_to : public binary_function<_Tp, _Tp, bool> {bool operator()(const _Tp& __x, const _Tp& __y) const { return __x == __y; }};

在_hash_fun.h中，定义了很多普通类型的hash函数，这里的struct hash<char>就是其中之一，它仅仅是将char值以ASCII值返回。如果你要使用复合类型，就要自己定义hash函数拉，这也是hash_map麻烦的地方。

程序执行添加键值是调用了stlp_std::hash_map的[]重载：

_STLP_TEMPLATE_FOR_CONT_EXT  _Tp& operator[](const _KT& __key) {    iterator __it = _M_ht.find(__key);  //搜索键    return (__it == _M_ht.end() ?  //如果没有搜索到键就执行插入，如果键已存在则返回值      _M_ht._M_insert(value_type(__key, _STLP_DEFAULT_CONSTRUCTED(_Tp))).second : //hashtable _M_ht;      (*__it).second );  }

stlp_std::hash_map中定义了stlp_std::hashtable的_M_ht，由它实际处理。在stlp_std::hash_map的[]重载中，首先在_M_ht中搜索键是否存在，如果不存在，则由_M_ht新建插入，如果存在，则返回值。我们新插入键值对<'a',"yoshino">。所以执行hashtable<...>::_M_insert(const value_type& __obj)。程序由此进入stlp_std::hashtable层。

stlp_std::hashtable：

hash table表格内的元素称之为桶（bucket），侯捷先森《STL源码剖析》中介绍了开链法实现的hashtable，如图

而在STLport中，hashtable的实现如下：

其中_BucketVector中的每个元素保存有链表_Slist_node相应的节点的指针。在stlp_std::hashtable定义了bucket：

//typedef slist<value_type, _All> _ElemsCont;_ElemsCont            _M_elems; //bucket list//typedef _STLP_PRIV _Slist_node_base _BucketType;//typedef vector<_BucketType*, _BucketAllocType> _BucketVector;  _BucketVector         _M_buckets;  //存储桶size_type             _M_num_elements; //标示hashtable元素个数，size()函数返回的即是此

其中_M_buckets就是存储桶，他是个vector数组，每个元素又是_Slist_node_base链表(图中的bucket list)。其中_Slist_node，_Slist_node_base等数据结构将在后面的stlp_std::slist介绍。之前在stlp_std::hash_map程序走到了_M_insert(const value_type& __obj)，其实现如下：

template <class _Val, class _Key, class _HF,          class _Traits, class _ExK, class _EqK, class _All>__reference__hashtable<_Val,_Key,_HF,_Traits,_ExK,_EqK,_All>  ::_M_insert(const value_type& __obj) {  _M_enlarge(_M_num_elements + 1);  //检查最大负载因子时的桶容量，如果这个容量大于当前桶容量，则需要扩展桶容量  return *insert_unique_noresize(__obj).first; //在不需要重建表格的情况下插入新节点。键值不允许重复}

这里涉及到hashmap中的一个概念——负载因子。负载因子=当前数据量/桶的总容量，实际就是占用空间的百分比。这里的_M_enlarge负责检查是否需要扩展桶容量，其实现如下：

template <class _Val, class _Key, class _HF,          class _Traits, class _ExK, class _EqK, class _All>void hashtable<_Val,_Key,_HF,_Traits,_ExK,_EqK,_All>::_M_enlarge(size_type __to_size){ // __to_size = _M_num_elements + 1  size_type __num_buckets = bucket_count(); //桶个数  size_type __num_buckets_hint = (size_type)((float)__to_size / max_load_factor()); //散列表中节点的数目除以基本区域能容纳的节点数  //max_load_factor()获取最大负载因子 , __num_buckets_hint代表获取最大负载因子时的HashMap的总容量，此时__num_buckets_hint处在最小状态  if (__num_buckets_hint <= __num_buckets) {    return;   //如果最大负载因子的HashMap的总容量小于当前容量，则什么也不做  }  __num_buckets = _STLP_PRIV _Stl_prime_type::_S_next_size(__num_buckets_hint);//返回桶大小下至要求尺寸  _M_rehash(__num_buckets);  //无条件地改变桶容量为__num_buckets

随后，程序执行hashtable<...>::insert_unique_noresize(const value_type& __obj):

template <class _Val, class _Key, class _HF,          class _Traits, class _ExK, class _EqK, class _All>pair<__iterator__, bool>hashtable<_Val,_Key,_HF,_Traits,_ExK,_EqK,_All>  ::insert_unique_noresize(const value_type& __obj) {  const size_type __n = _M_bkt_num(__obj); //获取捅号  _ElemsIte __cur(_M_buckets[__n]);  _ElemsIte __last(_M_buckets[__n + 1]);  if (__cur != __last) {    for (; __cur != __last; ++__cur) {      if (_M_equals(_M_get_key(*__cur), _M_get_key(__obj))) {        //We check that equivalent keys have equals hash code as otherwise, on resize,        //equivalent value might not be in the same bucket        _STLP_ASSERT(_M_hash(_M_get_key(*__cur)) == _M_hash(_M_get_key(__obj)))        return pair<iterator, bool>(iterator(__cur), false);      }    }    /* Here we do not rely on the _M_insert_noresize method as we know     * that we cannot break element orders, elements are unique, and     * insertion after the first bucket element is faster than what is     * done in _M_insert_noresize.     */    __cur = _M_elems.insert_after(_ElemsIte(_M_buckets[__n]), __obj);    ++_M_num_elements;//累加vector线性表元素个数    return pair<iterator, bool>(iterator(__cur), true);  }  return pair<iterator, bool>(_M_insert_noresize(__n, __obj), true);}

在insert_unique_noresize里，我们首先要计算出一个桶号，作为桶中存储的下标。方法是执行_M_bkt_num , 参数是键值对__obj,_M_bkt_num最终调用到_M_bkt_num_key。执行_M_hash(__key) % __n，求出桶号，这里的_M_hash就是之前在stlp_std::hash_map中默认指定的struct hash<char>。所以，这里的算出的桶号就是我们制定的char键的ASC码和桶容量做求模运算。

size_type _M_bkt_num_key(const _KT& __key) const{ return _M_bkt_num_key(__key, bucket_count()); }size_type _M_bkt_num(const value_type& __obj) const{ return _M_bkt_num_key(_M_get_key(__obj)); }_STLP_TEMPLATE_FOR_CONT_EXTsize_type _M_bkt_num_key(const _KT& __key, size_type __n) const{     return return _M_hash(__key) % __n;   //_M_hash(__key) % bucket_count()求出桶号}

获取桶号后，程序检查是否有相同hash码的等效key，防止它们处在同一个桶中。随后执行hashtable<...>::_M_insert_noresize(size_type __n, const value_type& __obj)：

template <class _Val, class _Key, class _HF,          class _Traits, class _ExK, class _EqK, class _All>__iterator__hashtable<_Val,_Key,_HF,_Traits,_ExK,_EqK,_All>  ::_M_insert_noresize(size_type __n, const value_type& __obj) {   //__n代表算出的捅号 ， __obj是键值对  //We always insert this element as 1st in the bucket to not break  //the elements order as equal elements must be kept next to each other.  size_type __prev = __n; //桶号  _ElemsIte __pos = _M_before_begin(__prev)._M_ite;                  //_pos根据桶号__prev获取插入点iter  //填充数据节点到_M_buckets.begin() + __prev  fill(_M_buckets.begin() + __prev, _M_buckets.begin() + __n + 1,      _M_elems.insert_after(__pos, __obj)._M_node);   //_M_elems是slist，在__pos的下一个位置插入新节点，值为__obj,这里的最终是新节点，数据最终存储到slist::_M_head _Slist_node_base _M_node  ++_M_num_elements; //累加捅数目  return iterator(_ElemsIte(_M_buckets[__n]));                              //返回新捅号数据的迭代}

这里传给_M_insert_noresize两个参数，__n代表算出的捅号， __obj是键值对。然后由__pos根据桶号获取下一个插入位置，_M_elems.insert_after将键值对__obj插入到位置__pos之后。fill填充将链表头填充到_M_buckets.begin() + __prev，也就是桶号位置。我们在程序中首先插入['a',"..."]，获取桶号6，则在_BucketVector::_M_start接上['a',"..."]（插入到链表_Slist_node，具体介绍后面介绍），当插入['b',"..."]时，获取桶号0，则将['b',"..."]插入['a',"..."]之前，作为_BucketVector::_M_start中的第一个元素。随后再插入['c',"..."]，获取桶号2，则将['c',"..."]插入['b',"..."]之后。以此类推，如下图：

stlp_std::slist：

hashtable将数据存储存储在_M_elems，它是个slist类型。在这层，主要根据上层的指令，代理完成链表的相关操作，下面介绍几个相关的数据类型：

//单向链表的节点基本结构struct _Slist_node_base {_Slist_node_base* _M_next;};//单向链表的节点结构template <class _Tp>class _Slist_node : public _Slist_node_base {public:        _Tp _M_data;              //键值对最后存储在这里        __TRIVIAL_STUFF(_Slist_node);};template <class _Tp, class _Alloc>class _Slist_base {        //...        typedef _STLP_alloc_proxy<_Slist_node_base, _Node, _M_node_allocator_type> _AllocProxy;        //...        _AllocProxy _M_head;};template <class _Tp, class _Alloc>class slist : protected _STLP_PRIV _Slist_base<_Tp,_Alloc>{        //...};

首先可见_Slist_node_base和继承它的_Slist_node，键值对数据就最终存储在链表结构_Slist_node::_M_data，各个键值对之间通过_Slist_node_base::_M_next关联。

由_M_elems.insert_after(__pos, __obj)进入，它最终执行：

__STATIC_CAST(_Node*, _STLP_PRIV __slist_make_link(__pos, _M_create_node(__x)));

这里我们调用_M_create_node(__x)建立一个新的_Slist_node实例（_Slist_node_base派生），然后执行__slist_make_link将这个新实例插入到__pos之后，_M_create_node实现如下：

_Node* _M_create_node(const value_type& __x) {   //typedef _STLP_PRIV _Slist_node<_Tp> _Node;    _Node* __node = this->_M_head.allocate(1);      //新建一个node，并分配空间    _STLP_TRY {      _Copy_Construct(&__node->_M_data, __x);        //配置新node的数据      __node->_M_next = 0;                                         //单向链表，end()->0;    }    _STLP_UNWIND(this->_M_head.deallocate(__node, 1))    return __node;                                                        //返回新加的node  }

这样，新的数据就被建立在_node中,然后_node被插入到__pos之后，这样链表的操作就完成了。回到stlp_std::hashtable的hashtable<...>::_M_insert_noresize

fill(_M_buckets.begin() + __prev, _M_buckets.begin() + __n + 1,      _M_elems.insert_after(__pos, __obj)._M_node);   //_M_elems是slist，在__pos的下一个位置插入新节点，值为__obj,这里的最终是新节点，数据最终存储到slist::_M_head _Slist_node_base _M_node  ++_M_num_elements; //累加捅数目  return iterator(_ElemsIte(_M_buckets[__n]));

_M_elems.insert_after完成了底层的链表操作，_M_node返回_Slist_node_base，由fill将它插入到_M_buckets.begin() + __prev（__prev桶号位置），然后层层返回指向新加数据的迭代器。这样就完成了stlp_std::hash_map的一次数据添加。

4 0