redis学习系列（五）--redis基础字典的构造

来源：互联网发布：java软件公司编辑：程序博客网时间：2024/06/06 07:41

最近真是有点疯啊，更新的有点勤快，主要还是最近项目处于一个真空期，迭代版本已经准备好，等着上线，下个版本需求已经完成大半，并且没什么复杂的，因此读书时间比较多，项目中其实用过redis，并且redis是集群的，用来保存分布式session和用作缓存使用的，配置文件我不知道，但是等这边的一些知识看完回去研究一下公司内部的redis代码，都是大神写的，看看还是不错的。

这一篇来看看字典的内部构造，其实看这些东西，感觉很枯燥，但是又必须要过一遍，不然你不知道用起来的时候它里面什么样子，只知道使用可不是个好现象。

字典的底层实现就是：字典-->哈希表-->哈希节点-->键值对

哈希表

typedef struct dictht {    // 哈希表数组    dictEntry **table;    // 哈希表大小    unsigned long size;    // 哈希表大小掩码，用于计算索引值    // 总是等于 size - 1    unsigned long sizemask;    // 该哈希表已有节点的数量    unsigned long used;} dictht;

table是一个数组，数组中每一个元素都是执行一个entry的指针，每一个entry都保存着一个键值对。

size保存了哈希表的大小，就是table数组的大小

used记录了哈希表目前已有节点的数量

sizemask属性的值总是等于size-1，这个属性和哈希值一起决定一个键应该被放到table数组的哪个索引上面

一个空的哈希表

$digraph { label = "\n 图 4-1 一个空的哈希表"; rankdir = LR; // node [shape = record]; dictht [label = " <head> dictht | <table> table | <size> size \n 4 | <sizemask> sizemask \n 3 | <used> used \n 0"]; table [label = " <head> dictEntry*[4] | <0> 0 | <1> 1 | <2> 2 | <3> 3 "]; // node [shape = plaintext, label = "NULL"]; null0; null1; null2; null3; // dictht:table -> table:head; table:0 -> null0; table:1 -> null1; table:2 -> null2; table:3 -> null3;}$

哈希表节点

哈表表内的节点用entry表示，每个entry保存着一个键值对

typedef struct dictEntry {    // 键    void *key;    // 值    union {        void *val;        uint64_t u64;        int64_t s64;    } v;    // 指向下个哈希表节点，形成链表    struct dictEntry *next;} dictEntry;

key保存着键值对中的键，而v属性保存着值，其内部保存的数据暂时不去追究，可以是指针，可以是uint64_t整数，可以是int64_t整数

next属性执行另一个哈希表节点的指针，注意这个指针可以将多个entry链接起来形成链表，其实就hashmap一样，是解决hash冲突的。

$digraph { label = "\n 图 4-2 连接在一起的键 k1 和键 k0"; rankdir = LR; // node [shape = record]; dictht [label = " <head> dictht | <table> table | <size> size \n 4 | <sizemask> sizemask \n 3 | <used> used \n 2"]; table [label = " <head> dictEntry*[4] | <0> 0 | <1> 1 | <2> 2 | <3> 3 "]; dictEntry0 [label = " <head> dictEntry | { k0 | v0 }"]; dictEntry1 [label = " <head> dictEntry | { k1 | v1 }"]; // node [shape = plaintext, label = "NULL"]; null0; null1; null2; null3; // dictht:table -> table:head; table:0 -> null0; table:1 -> null1; table:2 -> dictEntry1; dictEntry1 -> dictEntry0 -> null2 [label = "next"]; table:3 -> null3;}$

字典

回到最外层的构造，

typedef struct dict {    // 类型特定函数    dictType *type;    // 私有数据    void *privdata;    // 哈希表    dictht ht[2];    // rehash 索引    // 当 rehash 不在进行时，值为 -1    int rehashidx; /* rehashing not in progress if rehashidx == -1 */} dict;

type和privdata是针对不同类型的键值对而设置的，了解一下：

type是执行dictType结构的指针，每个type指针保存了一簇用于操作特定类型键值对的函数

privdata则是保存了需要传给那些特定函数的可选参数。

typedef struct dictType {    // 计算哈希值的函数    unsigned int (*hashFunction)(const void *key);    // 复制键的函数    void *(*keyDup)(void *privdata, const void *key);    // 复制值的函数    void *(*valDup)(void *privdata, const void *obj);    // 对比键的函数    int (*keyCompare)(void *privdata, const void *key1, const void *key2);    // 销毁键的函数    void (*keyDestructor)(void *privdata, void *key);    // 销毁值的函数    void (*valDestructor)(void *privdata, void *obj);} dictType;

ht属性则是我们上面说过的哈希表存放的地方，内部有两个数组组成，一般情况下字典只使用ht[0]哈希表，而ht[1]则是对ht[0]进行rehash时使用。

而要进行rehash，另一个值就是rehashidx。

图为普通状态下的字典：

$digraph { label = "\n 图 4-3 普通状态下的字典"; rankdir = LR; // node [shape = record]; dict [label = " <head> dict | type | privdata | <ht> ht | rehashidx \n -1 "]; dictht0 [label = " <head> dictht | <table> table | <size> size \n 4 | <sizemask> sizemask \n 3 | <used> used \n 2"]; dictht1 [label = " <head> dictht | <table> table | <size> size \n 0 | <sizemask> sizemask \n 0 | <used> used \n 0"]; table0 [label = " <head> dictEntry*[4] | <0> 0 | <1> 1 | <2> 2 | <3> 3 "]; table1 [label = "NULL", shape = plaintext]; dictEntry0 [label = " <head> dictEntry | { k0 | v0 }"]; dictEntry1 [label = " <head> dictEntry | { k1 | v1 }"]; // node [shape = plaintext, label = "NULL"]; null0; null1; null2; null3; // dict:ht -> dictht0:head [label = "ht[0]"]; dict:ht -> dictht1:head [label = "ht[1]"]; dictht0:table -> table0:head; dictht1:table -> table1; table0:0 -> null0; table0:1 -> dictEntry0:head -> null1; table0:2 -> null2; table0:3 -> dictEntry1:head -> null3;}$

hash算法

与hashmap类似，redis里面使用的hash算法也是需要先计算hash值，然后再计算索引值，最后再将元素放到指定位置上去。

# 使用字典设置的哈希函数，计算键 key 的哈希值hash = dict->type->hashFunction(key);# 使用哈希表的 sizemask 属性和哈希值，计算出索引值# 根据情况不同， ht[x] 可以是 ht[0] 或者 ht[1]index = hash & dict->ht[x].sizemask;

数据被存放进字典

$digraph { label = "\n 图 4-5 添加键值对 k0 和 v0 之后的字典"; rankdir = LR; // node [shape = record]; dict [label = " <head> dict | type | privdata | <ht> ht | rehashidx \n -1 "]; dictht0 [label = " <head> dictht | <table> table | <size> size \n 4 | <sizemask> sizemask \n 3 | <used> used \n 1"]; dictht1 [label = "...", shape = plaintext]; table0 [label = " <head> dictEntry*[4] | <0> 0 | <1> 1 | <2> 2 | <3> 3 "]; //table1 [label = "NULL", shape = plaintext]; dictEntry [label = " <head> dictEntry | { k0 | v0 } "]; // node [shape = plaintext, label = "NULL"]; null0; null1; null2; null3; // dict:ht -> dictht0:head [label = "ht[0]"]; dict:ht -> dictht1:head [label = "ht[1]"]; dictht0:table -> table0:head; //dictht1:table -> table1; table0:0 -> dictEntry:head -> null0; table0:1 -> null1; table0:2 -> null2; table0:3 -> null3;}$

hash冲突

使用链表法解决冲突。同理，最新插入的也是在最前面。

$digraph { label = "\n 图 4-6 一个包含两个键值对的哈希表"; rankdir = LR; // node [shape = record]; dictht0 [label = " <head> dictht | <table> table | <size> size \n 4 | <sizemask> sizemask \n 3 | <used> used \n 2"]; table0 [label = " <head> dictEntry*[4] | <0> 0 | <1> 1 | <2> 2 | <3> 3 "]; //table1 [label = "NULL", shape = plaintext]; kv0 [label = " <head> dictEntry | { k0 | v0 } "]; kv1 [label = " <head> dictEntry | { k1 | v1 } "]; // node [shape = plaintext, label = "NULL"]; null0; null1; null2; null3; // dictht0:table -> table0:head; table0:0 -> kv0:head; kv0:head -> null0 [label = "next"]; table0:1 -> null1; table0:2 -> kv1:head; kv1:head -> null2 [label = "next"]; table0:3 -> null3;}$

$digraph { label = "\n 图 4-7 使用链表解决 k2 和 k1 的冲突"; rankdir = LR; // node [shape = record]; dictht0 [label = " <head> dictht | <table> table | <size> size \n 4 | <sizemask> sizemask \n 3 | <used> used \n 3"]; table0 [label = " <head> dictEntry*[4] | <0> 0 | <1> 1 | <2> 2 | <3> 3 "]; //table1 [label = "NULL", shape = plaintext]; kv0 [label = " <head> dictEntry | { k0 | v0 } "]; kv1 [label = " <head> dictEntry | { k1 | v1 } "]; kv2 [label = " <head> dictEntry | { k2 | v2 } "]; // node [shape = plaintext, label = "NULL"]; null0; null1; null2; null3; // dictht0:table -> table0:head; table0:0 -> kv0:head; kv0:head -> null0 [label = "next"]; table0:1 -> null1; table0:2 -> kv2:head; kv2:head -> kv1:head [label = "next"]; kv1:head -> null2 [label = "next"]; table0:3 -> null3;}$

rehash

redis会对哈希表进行扩展或者收缩，以维持负载因子在一个合理的范围之内

执行步骤分成3个阶段：

1.为字典ht[1]哈希表分配空间，这个空间大小取决于执行的操作，以及当前ht[0]的所保存的键值对数量，也就是ht[0].used的值

1.1 如果是扩展操作，那么ht[1]的大小为第一个大于等于ht[0].used * 2的 2^n的值

1.2 如果执行收缩操作，那么ht[1]的大小为第一个大于等于ht[0].used的2^n的值

2.将保存在ht[0]中的所有键值对rehash到ht[1]上面

3.当ht[0]包含的所有键值对都迁移到ht[1]上面后，释放ht[0]，将ht[1]设置成ht[0],并在ht[1]处新建空白哈希表

原始字典值：

$digraph { label = "\n 图 4-8 执行 rehash 之前的字典"; rankdir = LR; node [shape = record]; // 字典 dict [label = " <head> dict | type | privdata | <ht> ht | rehashidx \n -1 "]; // 哈希表 dictht0 [label = " <head> dictht | <table> table | <size> size \n 4 | <sizemask> sizemask \n 3 | <used> used \n 4"]; dictht1 [label = " <head> dictht | <table> table | <size> size \n 0 | <sizemask> sizemask \n 0 | <used> used \n 0"]; table0 [label = " <head> dictEntry*[4] | <0> 0 | <1> 1 | <2> 2 | <3> 3 "]; table1 [label = "NULL", shape = plaintext]; // 哈希表节点 kv0 [label = " <head> dictEntry | { k0 | v0 } "]; kv1 [label = " <head> dictEntry | { k1 | v1 } "]; kv2 [label = " <head> dictEntry | { k2 | v2 } "]; kv3 [label = " <head> dictEntry | { k3 | v3 } "]; // node [shape = plaintext, label = "NULL"]; null0; null1; null2; null3; // dict:ht -> dictht0:head [label = "ht[0]"]; dict:ht -> dictht1:head [label = "ht[1]"]; dictht0:table -> table0:head; dictht1:table -> table1; table0:0 -> kv2:head -> null0; table0:1 -> kv0:head -> null1; table0:2 -> kv3:head -> null2; table0:3 -> kv1:head -> null3;}$

分配空间

$digraph { label = "\n 图 4-9 为字典的 ht[1] 哈希表分配空间"; rankdir = LR; node [shape = record]; // 字典 dict [label = " <head> dict | type | privdata | <ht> ht | rehashidx \n -1 "]; // 哈希表 dictht0 [label = " <head> dictht | <table> table | <size> size \n 4 | <sizemask> sizemask \n 3 | <used> used \n 4"]; dictht1 [label = " <head> dictht | <table> table | <size> size \n 8 | <sizemask> sizemask \n 7 | <used> used \n 0"]; table0 [label = " <head> dictEntry*[4] | <0> 0 | <1> 1 | <2> 2 | <3> 3 "]; table1 [label = " <head> dictEntry*[8] | <0> 0 | <1> 1 | <2> 2 | ... | <7> 7 "]; // 哈希表节点 kv0 [label = " <head> dictEntry | { k0 | v0 } "]; kv1 [label = " <head> dictEntry | { k1 | v1 } "]; kv2 [label = " <head> dictEntry | { k2 | v2 } "]; kv3 [label = " <head> dictEntry | { k3 | v3 } "]; // node [shape = plaintext, label = "NULL"]; // dict:ht -> dictht0:head [label = "ht[0]"]; dict:ht -> dictht1:head [label = "ht[1]"]; dictht0:table -> table0:head; dictht1:table -> table1:head; table0:0 -> kv2:head -> null0; table0:1 -> kv0:head -> null1; table0:2 -> kv3:head -> null2; table0:3 -> kv1:head -> null3; table1:0 -> null10; table1:1 -> null11; table1:2 -> null12; table1:7 -> null17;}$

移动到ht[1]处

$digraph { label = "\n 图 4-10 ht[0] 的所有键值对都已经被迁移到 ht[1]"; rankdir = LR; node [shape = record]; // 字典 dict [label = " <head> dict | type | privdata | <ht> ht | rehashidx \n -1 "]; // 哈希表 dictht0 [label = " <head> dictht | <table> table | <size> size \n 4 | <sizemask> sizemask \n 3 | <used> used \n 0"]; dictht1 [label = " <head> dictht | <table> table | <size> size \n 8 | <sizemask> sizemask \n 7 | <used> used \n 4"]; table0 [label = " <head> dictEntry*[4] | <0> 0 | <1> 1 | <2> 2 | <3> 3 "]; table1 [label = " <head> dictEntry*[8] | ... | <1> 1 | ... | <4> 4 | <5> 5 | ... | <7> 7 "]; // 哈希表节点 kv0 [label = " <head> dictEntry | { k0 | v0 } "]; kv1 [label = " <head> dictEntry | { k1 | v1 } "]; kv2 [label = " <head> dictEntry | { k2 | v2 } "]; kv3 [label = " <head> dictEntry | { k3 | v3 } "]; // node [shape = plaintext, label = "NULL"]; // dict:ht -> dictht0:head [label = "ht[0]"]; dict:ht -> dictht1:head [label = "ht[1]"]; dictht0:table -> table0:head; dictht1:table -> table1:head; table0:0 -> null0; table0:1 -> null1; table0:2 -> null2; table0:3 -> null3; table1:1 -> kv3:head -> null11; table1:4 -> kv2:head -> null14; table1:5 -> kv0:head -> null15; table1:7 -> kv1:head -> null17;}$

清除ht[0],并更名及重新生成空白ht[1]。

$digraph { label = "\n 图 4-11 完成 rehash 之后的字典"; rankdir = LR; node [shape = record]; // 字典 dict [label = " <head> dict | type | privdata | <ht> ht | rehashidx \n -1 "]; // 哈希表 dictht0 [label = " <head> dictht | <table> table | <size> size \n 8 | <sizemask> sizemask \n 7 | <used> used \n 4"]; dictht1 [label = " <head> dictht | <table> table | <size> size \n 0 | <sizemask> sizemask \n 0 | <used> used \n 0"]; table0 [label = " <head> dictEntry*[8] | ... | <1> 1 | ... | <4> 4 | <5> 5 | ... | <7> 7 "]; table1 [label = "NULL", shape = plaintext]; // 哈希表节点 kv0 [label = " <head> dictEntry | { k0 | v0 } "]; kv1 [label = " <head> dictEntry | { k1 | v1 } "]; kv2 [label = " <head> dictEntry | { k2 | v2 } "]; kv3 [label = " <head> dictEntry | { k3 | v3 } "]; // node [shape = plaintext, label = "NULL"]; // dict:ht -> dictht0:head [label = "ht[0]"]; dict:ht -> dictht1:head [label = "ht[1]"]; dictht0:table -> table0:head; dictht1:table -> table1; table0:1 -> kv3:head -> null11; table0:4 -> kv2:head -> null14; table0:5 -> kv0:head -> null15; table0:7 -> kv1:head -> null17;}$

hash表的扩展与收缩

满足以下条件之一，程序会自动开始对哈希表进行扩展操作

1.服务器没有执行BGSAVE或者BGREWRITEAOF命令，并且hash表的负载因子大于等于1。

2服务器正在执行BGSAVE或者BGREWRITEAOF命令，并且hash表的负载因子大于等于5.

负载因子 = ht[0].used / ht[0].size

说明：因为在BGSAVE或者BGREWRITEAOF期间，redis需要创建当前服务器进程的子进程，因此在子进程期间会扩大执行进展所需要的负载因子，尽可能避免在子进程存在期间进行哈希表扩展操作，节约内存。

当负载因子小于0.1时，会自动进行收缩操作。

阅读全文

0 0