Hash_table 的简单使用说明

来源：互联网发布：美工助理一般工资多少编辑：程序博客网时间：2024/04/28 10:12

Hash_table 的简单使用说明

程如亮

背景：

接触实时数据库时，大部分查找算法会使用hash_table。Hash_table的设计好坏。取决于散列函数的命中率。如果散列函数设计的好。查询效率接近O（1）。花点时间学习下，遂成此文。

分析：

Hash_table的特点，简单说：用空间换时间。

首先要知道几个常用的名词：散列函数，冲突，链表。

先从简单的说：链表。链表简单的说就是把数据的集合采用动态分配空间的方式，节约内存，便于动态增减，但访问一般只能遍历。常见的教材上一般如下定义：

struct node /*定义节点*/

{

Data data; /*节点数据*/

struct node * next; /*节点指针，指向下个节点*/

} ;

顺便提下：出现C++后，结构体和类是思想上是等价的。所以以C++为基础讲解hash_table的时候，一般会增加一个构造函数。常见定义如下：

struct node /*定义节点*/

{

node():_next(NULL){} /*构造函数*/

string _value; /*节点数据*/

node* _next; /*节点指针，指向下个节点*/

};

个人认为：看起来比较优雅点。至于实现链表操作就不再举例子说明。

其次就是散列函数。散列函数严格意思上说，是加密函数，可在获取明文后将其转换为固定长度的加密输出。散列函数一般是单向的，具有不可逆性。在这里散列函数就是通过输入一个值得到目标空间的一个“下标”，然后通过“下标”直接访问空间对应的数据。这就是hash_table的基本原理。当然在这中间就会牵扯到一个冲突的问题。

冲突：就是输入的值通过散列函数得到的“下标”是一样的。就也就说明不可能设计一个一一对应的散列函数。在解决冲突的问题上。有很多种方法。这里就采取常用的链表。

程序分析：

/***************************** Hash_table.h**********************************/

#ifndef HASH_TABLE_H

#define HASH_TABLE_H

#include <string>

using namespace std;

struct node /*定义节点*/

{

node():_next(NULL){}

string _value;

node* _next;

};

typedef node* hash_node; /*类型定义*/

const int MULT = 31; /*散列函数的参数*/

const int TABLE = 10000; /*数组的大小*/

class hash_table

{

public:

/*构造函数*/

hash_table(hash_node* table);

/*析构函数*/

~hash_table();

/*向hash_table插入元素*/

void Insert(const string& word);

/*从hash_table中查找元素*/

int Search(const string& word);

private:

/*散列函数*/

unsigned int hash(const string& word);

private:

hash_node* _table;

};

#endif

/************************************end*************************************/

/*****************************hash_table.cpp***********************************/

#include "hash_table.h"

#include <iostream>

using namespace std;

/*构造函数*/

hash_table::hash_table(hash_node* table)

{

_table = table;

}

/*析构函数*/

hash_table::~hash_table()

{

delete[] _table;

}

/*散列函数*/

unsigned int hash_table::hash(const string& word)

{

const char* p = word.c_str();

unsigned int h = 0;

for (; p; p++) /*hash_table的心脏*/

{

h = (h*MULT) % TABLE + (*p) % TABLE;

}

return h;

}

/*插入函数*/

void hash_table::Insert(const string& word)

{

/*得到对应的散列值*/

int h = hash(word);

/*对应节点为空，插入本节点*/

if (_table[h] == NULL)

{

hash_node n = new node();

n->_value = word;

n->_next = NULL;

_table[h] = n;

return ;

}

/*如果节点不为空，连结在本节点为头节点的链表*/

for (hash_node p = _table[h];p != NULL;p = p->_next)

{

/*包含相同的值，直接返回*/

if (p->_value == word) return ;

}

/*发生冲突，处理冲突*/

hash_node n = new node();

n->_value = word;

n->_next = _table[h];

_table[h] = n;

}

/*查询函数*/

int hash_table::Search(const string& word)

{

/*得到对应的散列值*/

int h = hash(word);

/*如果对应的节点为空，直接返回*/

if (_table[h] == NULL)

{

return -1;

}

/*循环本节点，匹配对应的值，返回结果*/

for (hash_node p = _table[h];p != NULL;p = p->_next)

{

if (p->_value == word)

{

return 1;

}

return -1;

}

/************************************end***********************************/

我没对这个散列函数的命中率进行过统计。所以对命中率没有理性的认识。前人走下的路不必去重复。

谈了hash_table.不得不说下hash_map.简单点说。Hash_map就是对hash_table的封装。

但目前不是标准C++的东西。

顺便说下hash_map和map的不同：

总体来说，hash_map 查找速度会比map快，而且查找速度基本和数据数据量大小，属于常数级别;而map的查找速度是log(n)级别。因为hash_map采用的是hash_table数据结构，而map采用的是红黑树数据结构。并不一定常数就比log(n)小，hash还有hash函数的耗时，明白了吧，如果你考虑效率，特别是在元素达到一定数量级时，考虑考虑hash_map。但若你对内存使用特别严格，希望程序尽可能少消耗内存，那么一定要小心，hash_map可能会让你陷入尴尬，特别是当你的hash_map对象特别多时，你就更无法控制了，而且hash_map的构造速度较慢。

现在知道如何选择了吗？权衡三个因素: 查找速度, 数据量, 内存使用。

申明：资料来源网络。共享于网络。

Cheng_rliang

2007-4-17