【Redis源码】rehash源码剖析
来源:互联网 发布:香蕉网络电视免费tv 编辑:程序博客网 时间:2024/06/08 00:00
如果你对rehash还不是很了解,请参考:图解rehash
1.rehash的分类
Redis为了兼顾性能的考虑,分为lazy和active的两种rehash操作,同时进行,直到rehash完成。
- lazy rehashing:在每次对dict进行操作的时候执行一个slot的rehash
- active rehashing:每100ms里面使用1ms时间进行rehash。(serverCron函数)
2.lazy rehashing
函数的调用过程:_dictRehashStep–> dictRehash
(1)dict.c/_dictRehashStep函数源码如下
/* This function performs just a step of rehashing, and only if there are * no safe iterators bound to our hash table. When we have iterators in the * middle of a rehashing we can't mess with the two hash tables otherwise * some element can be missed or duplicated. * * 在字典不存在安全迭代器的情况下,对字典进行单步 rehash 。 * * 字典有安全迭代器的情况下不能进行 rehash , * 因为两种不同的迭代和修改操作可能会弄乱字典。 * * This function is called by common lookup or update operations in the * dictionary so that the hash table automatically migrates from H1 to H2 * while it is actively used. * * 这个函数被多个通用的查找、更新操作调用, * 它可以让字典在被使用的同时进行 rehash 。 * * T = O(1) */static void _dictRehashStep(dict *d) { if (d->iterators == 0) dictRehash(d,1);}
(2)dict.c/dictRehash函数源码如下
/* Performs N steps of incremental rehashing. Returns 1 if there are still * keys to move from the old to the new hash table, otherwise 0 is returned. * * 执行 N 步渐进式 rehash 。 * * 返回 1 表示仍有键需要从 0 号哈希表移动到 1 号哈希表, * 返回 0 则表示所有键都已经迁移完毕。 * * Note that a rehashing step consists in moving a bucket (that may have more * than one key as we use chaining) from the old to the new hash table. * * 注意,每步 rehash 都是以一个哈希表索引(桶)作为单位的, * 一个桶里可能会有多个节点, * 被 rehash 的桶里的所有节点都会被移动到新哈希表。 * * T = O(N) */int dictRehash(dict *d, int n) { // 只可以在 rehash 进行中时执行 if (!dictIsRehashing(d)) return 0; // 进行 N 步迁移 // T = O(N) while(n--) { dictEntry *de, *nextde; /* Check if we already rehashed the whole table... */ // 如果 0 号哈希表为空,那么表示 rehash 执行完毕 // T = O(1) if (d->ht[0].used == 0) { // 释放 0 号哈希表 zfree(d->ht[0].table); // 将原来的 1 号哈希表设置为新的 0 号哈希表 d->ht[0] = d->ht[1]; // 重置旧的 1 号哈希表 _dictReset(&d->ht[1]); // 关闭 rehash 标识 d->rehashidx = -1; // 返回 0 ,向调用者表示 rehash 已经完成 return 0; } /* Note that rehashidx can't overflow as we are sure there are more * elements because ht[0].used != 0 */ // 确保 rehashidx 没有越界 assert(d->ht[0].size > (unsigned)d->rehashidx); // 略过数组中为空的索引,找到下一个非空索引 while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++; // 指向该索引的链表表头节点 de = d->ht[0].table[d->rehashidx]; /* Move all the keys in this bucket from the old to the new hash HT */ // 将链表中的所有节点迁移到新哈希表 // T = O(1) while(de) { unsigned int h; // 保存下个节点的指针 nextde = de->next; /* Get the index in the new hash table */ // 计算新哈希表的哈希值,以及节点插入的索引位置 h = dictHashKey(d, de->key) & d->ht[1].sizemask; // 插入节点到新哈希表 de->next = d->ht[1].table[h]; d->ht[1].table[h] = de; // 更新计数器 d->ht[0].used--; d->ht[1].used++; // 继续处理下个节点 de = nextde; } // 将刚迁移完的哈希表索引的指针设为空 d->ht[0].table[d->rehashidx] = NULL; // 更新 rehash 索引 d->rehashidx++; } return 1;}
(3)源码分析
在_dictRehashStep函数中,会调用dictRehash方法,而_dictRehashStep每次仅会rehash一个值从ht[0]到 ht[1],但由于_dictRehashStep是被dictGetRandomKey、dictFind、 dictGenericDelete、dictAdd调用的,因此在每次dict增删查改时都会被调用,这无疑就加快了rehash过程。
在dictRehash函数中每次增量rehash n个元素,由于在自动调整大小时已设置好了ht[1]的大小,因此rehash的主要过程就是遍历ht[0],取得key,然后将该key按ht[1]的 桶的大小重新rehash,并在rehash完后将ht[0]指向ht[1],然后将ht[1]清空。在这个过程中rehashidx非常重要,它表示上次rehash时在ht[0]的下标位置。
3.active rehashing
函数调用的过程如下:
serverCron->databasesCron–>incrementallyRehash->dictRehashMilliseconds->dictRehash
(1)redis.c/serverCron函数的源码
/* This is our timer interrupt, called server.hz times per second. * * 这是 Redis 的时间中断器,每秒调用 server.hz 次。 * * Here is where we do a number of things that need to be done asynchronously. * For instance: * * 以下是需要异步执行的操作: * * - Active expired keys collection (it is also performed in a lazy way on * lookup). * 主动清除过期键。 * * - Software watchdog. * 更新软件 watchdog 的信息。 * * - Update some statistic. * 更新统计信息。 * * - Incremental rehashing of the DBs hash tables. * 对数据库进行渐增式 Rehash * * - Triggering BGSAVE / AOF rewrite, and handling of terminated children. * 触发 BGSAVE 或者 AOF 重写,并处理之后由 BGSAVE 和 AOF 重写引发的子进程停止。 * * - Clients timeout of different kinds. * 处理客户端超时。 * * - Replication reconnection. * 复制重连 * * - Many more... * 等等。。。 * * Everything directly called here will be called server.hz times per second, * so in order to throttle execution of things we want to do less frequently * a macro is used: run_with_period(milliseconds) { .... } * * 因为 serverCron 函数中的所有代码都会每秒调用 server.hz 次, * 为了对部分代码的调用次数进行限制, * 使用了一个宏 run_with_period(milliseconds) { ... } , * 这个宏可以将被包含代码的执行次数降低为每 milliseconds 执行一次。 */int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) { int j; REDIS_NOTUSED(eventLoop); REDIS_NOTUSED(id); REDIS_NOTUSED(clientData); /* Software watchdog: deliver the SIGALRM that will reach the signal * handler if we don't return here fast enough. */ if (server.watchdog_period) watchdogScheduleSignal(server.watchdog_period); /* Update the time cache. */ updateCachedTime(); // 记录服务器执行命令的次数 run_with_period(100) trackOperationsPerSecond(); /* We have just REDIS_LRU_BITS bits per object for LRU information. * So we use an (eventually wrapping) LRU clock. * * Note that even if the counter wraps it's not a big problem, * everything will still work but some object will appear younger * to Redis. However for this to happen a given object should never be * touched for all the time needed to the counter to wrap, which is * not likely. * * 即使服务器的时间最终比 1.5 年长也无所谓, * 对象系统仍会正常运作,不过一些对象可能会比服务器本身的时钟更年轻。 * 不过这要这个对象在 1.5 年内都没有被访问过,才会出现这种现象。 * * Note that you can change the resolution altering the * REDIS_LRU_CLOCK_RESOLUTION define. * * LRU 时间的精度可以通过修改 REDIS_LRU_CLOCK_RESOLUTION 常量来改变。 */ server.lruclock = getLRUClock(); /* Record the max memory used since the server was started. */ // 记录服务器的内存峰值 if (zmalloc_used_memory() > server.stat_peak_memory) server.stat_peak_memory = zmalloc_used_memory(); /* Sample the RSS here since this is a relatively slow call. */ server.resident_set_size = zmalloc_get_rss(); /* We received a SIGTERM, shutting down here in a safe way, as it is * not ok doing so inside the signal handler. */ // 服务器进程收到 SIGTERM 信号,关闭服务器 if (server.shutdown_asap) { // 尝试关闭服务器 if (prepareForShutdown(0) == REDIS_OK) exit(0); // 如果关闭失败,那么打印 LOG ,并移除关闭标识 redisLog(REDIS_WARNING,"SIGTERM received but errors trying to shut down the server, check the logs for more information"); server.shutdown_asap = 0; } /* Show some info about non-empty databases */ // 打印数据库的键值对信息 run_with_period(5000) { for (j = 0; j < server.dbnum; j++) { long long size, used, vkeys; // 可用键值对的数量 size = dictSlots(server.db[j].dict); // 已用键值对的数量 used = dictSize(server.db[j].dict); // 带有过期时间的键值对数量 vkeys = dictSize(server.db[j].expires); // 用 LOG 打印数量 if (used || vkeys) { redisLog(REDIS_VERBOSE,"DB %d: %lld keys (%lld volatile) in %lld slots HT.",j,used,vkeys,size); /* dictPrintStats(server.dict); */ } } } /* Show information about connected clients */ // 如果服务器没有运行在 SENTINEL 模式下,那么打印客户端的连接信息 if (!server.sentinel_mode) { run_with_period(5000) { redisLog(REDIS_VERBOSE, "%lu clients connected (%lu slaves), %zu bytes in use", listLength(server.clients)-listLength(server.slaves), listLength(server.slaves), zmalloc_used_memory()); } } /* We need to do a few operations on clients asynchronously. */ // 检查客户端,关闭超时客户端,并释放客户端多余的缓冲区 clientsCron(); /* Handle background operations on Redis databases. */ // 对数据库执行各种操作 databasesCron(); /* Start a scheduled AOF rewrite if this was requested by the user while * a BGSAVE was in progress. */ // 如果 BGSAVE 和 BGREWRITEAOF 都没有在执行 // 并且有一个 BGREWRITEAOF 在等待,那么执行 BGREWRITEAOF if (server.rdb_child_pid == -1 && server.aof_child_pid == -1 && server.aof_rewrite_scheduled) { rewriteAppendOnlyFileBackground(); } /* Check if a background saving or AOF rewrite in progress terminated. */ // 检查 BGSAVE 或者 BGREWRITEAOF 是否已经执行完毕 if (server.rdb_child_pid != -1 || server.aof_child_pid != -1) { int statloc; pid_t pid; // 接收子进程发来的信号,非阻塞 if ((pid = wait3(&statloc,WNOHANG,NULL)) != 0) { int exitcode = WEXITSTATUS(statloc); int bysignal = 0; if (WIFSIGNALED(statloc)) bysignal = WTERMSIG(statloc); // BGSAVE 执行完毕 if (pid == server.rdb_child_pid) { backgroundSaveDoneHandler(exitcode,bysignal); // BGREWRITEAOF 执行完毕 } else if (pid == server.aof_child_pid) { backgroundRewriteDoneHandler(exitcode,bysignal); } else { redisLog(REDIS_WARNING, "Warning, detected child with unmatched pid: %ld", (long)pid); } updateDictResizePolicy(); } } else { /* If there is not a background saving/rewrite in progress check if * we have to save/rewrite now */ // 既然没有 BGSAVE 或者 BGREWRITEAOF 在执行,那么检查是否需要执行它们 // 遍历所有保存条件,看是否需要执行 BGSAVE 命令 for (j = 0; j < server.saveparamslen; j++) { struct saveparam *sp = server.saveparams+j; /* Save if we reached the given amount of changes, * the given amount of seconds, and if the latest bgsave was * successful or if, in case of an error, at least * REDIS_BGSAVE_RETRY_DELAY seconds already elapsed. */ // 检查是否有某个保存条件已经满足了 if (server.dirty >= sp->changes && server.unixtime-server.lastsave > sp->seconds && (server.unixtime-server.lastbgsave_try > REDIS_BGSAVE_RETRY_DELAY || server.lastbgsave_status == REDIS_OK)) { redisLog(REDIS_NOTICE,"%d changes in %d seconds. Saving...", sp->changes, (int)sp->seconds); // 执行 BGSAVE rdbSaveBackground(server.rdb_filename); break; } } /* Trigger an AOF rewrite if needed */ // 出发 BGREWRITEAOF if (server.rdb_child_pid == -1 && server.aof_child_pid == -1 && server.aof_rewrite_perc && // AOF 文件的当前大小大于执行 BGREWRITEAOF 所需的最小大小 server.aof_current_size > server.aof_rewrite_min_size) { // 上一次完成 AOF 写入之后,AOF 文件的大小 long long base = server.aof_rewrite_base_size ? server.aof_rewrite_base_size : 1; // AOF 文件当前的体积相对于 base 的体积的百分比 long long growth = (server.aof_current_size*100/base) - 100; // 如果增长体积的百分比超过了 growth ,那么执行 BGREWRITEAOF if (growth >= server.aof_rewrite_perc) { redisLog(REDIS_NOTICE,"Starting automatic rewriting of AOF on %lld%% growth",growth); // 执行 BGREWRITEAOF rewriteAppendOnlyFileBackground(); } } } // 根据 AOF 政策, // 考虑是否需要将 AOF 缓冲区中的内容写入到 AOF 文件中 /* AOF postponed flush: Try at every cron cycle if the slow fsync * completed. */ if (server.aof_flush_postponed_start) flushAppendOnlyFile(0); /* AOF write errors: in this case we have a buffer to flush as well and * clear the AOF error in case of success to make the DB writable again, * however to try every second is enough in case of 'hz' is set to * an higher frequency. */ run_with_period(1000) { if (server.aof_last_write_status == REDIS_ERR) flushAppendOnlyFile(0); } /* Close clients that need to be closed asynchronous */ // 关闭那些需要异步关闭的客户端 freeClientsInAsyncFreeQueue(); /* Clear the paused clients flag if needed. */ clientsArePaused(); /* Don't check return value, just use the side effect. */ /* Replication cron function -- used to reconnect to master and * to detect transfer failures. */ // 复制函数 // 重连接主服务器、向主服务器发送 ACK 、判断数据发送失败情况、断开本服务器超时的从服务器,等等 run_with_period(1000) replicationCron(); /* Run the Redis Cluster cron. */ // 如果服务器运行在集群模式下,那么执行集群操作 run_with_period(100) { if (server.cluster_enabled) clusterCron(); } /* Run the Sentinel timer if we are in sentinel mode. */ // 如果服务器运行在 sentinel 模式下,那么执行 SENTINEL 的主函数 run_with_period(100) { if (server.sentinel_mode) sentinelTimer(); } /* Cleanup expired MIGRATE cached sockets. */ // 集群。。。TODO run_with_period(1000) { migrateCloseTimedoutSockets(); } // 增加 loop 计数器 server.cronloops++; return 1000/server.hz;}
(2)redis.c/databasesCron函数源码如下
/* This function handles 'background' operations we are required to do * incrementally in Redis databases, such as active key expiring, resizing, * rehashing. */// 对数据库执行删除过期键,调整大小,以及主动和渐进式 rehashvoid databasesCron(void) { // 函数先从数据库中删除过期键,然后再对数据库的大小进行修改 /* Expire keys by random sampling. Not required for slaves * as master will synthesize DELs for us. */ // 如果服务器不是从服务器,那么执行主动过期键清除 if (server.active_expire_enabled && server.masterhost == NULL) // 清除模式为 CYCLE_SLOW ,这个模式会尽量多清除过期键 activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW); /* Perform hash tables rehashing if needed, but only if there are no * other processes saving the DB on disk. Otherwise rehashing is bad * as will cause a lot of copy-on-write of memory pages. */ // 在没有 BGSAVE 或者 BGREWRITEAOF 执行时,对哈希表进行 rehash if (server.rdb_child_pid == -1 && server.aof_child_pid == -1) { /* We use global counters so if we stop the computation at a given * DB we'll be able to start from the successive in the next * cron loop iteration. */ static unsigned int resize_db = 0; static unsigned int rehash_db = 0; unsigned int dbs_per_call = REDIS_DBCRON_DBS_PER_CALL; unsigned int j; /* Don't test more DBs than we have. */ // 设定要测试的数据库数量 if (dbs_per_call > server.dbnum) dbs_per_call = server.dbnum; /* Resize */ // 调整字典的大小 for (j = 0; j < dbs_per_call; j++) { tryResizeHashTables(resize_db % server.dbnum); resize_db++; } /* Rehash */ // 对字典进行渐进式 rehash if (server.activerehashing) { for (j = 0; j < dbs_per_call; j++) { int work_done = incrementallyRehash(rehash_db % server.dbnum); rehash_db++; if (work_done) { /* If the function did some work, stop here, we'll do * more at the next cron loop. */ break; } } } }}
(3)redis.c/incrementallyRehash函数源码
/* Our hash table implementation performs rehashing incrementally while * we write/read from the hash table. Still if the server is idle, the hash * table will use two tables for a long time. So we try to use 1 millisecond * of CPU time at every call of this function to perform some rehahsing. * * 虽然服务器在对数据库执行读取/写入命令时会对数据库进行渐进式 rehash , * 但如果服务器长期没有执行命令的话,数据库字典的 rehash 就可能一直没办法完成, * 为了防止出现这种情况,我们需要对数据库执行主动 rehash 。 * * The function returns 1 if some rehashing was performed, otherwise 0 * is returned. * * 函数在执行了主动 rehash 时返回 1 ,否则返回 0 。 */int incrementallyRehash(int dbid) { /* Keys dictionary */ if (dictIsRehashing(server.db[dbid].dict)) { dictRehashMilliseconds(server.db[dbid].dict,1); return 1; /* already used our millisecond for this loop... */ } /* Expires */ if (dictIsRehashing(server.db[dbid].expires)) { dictRehashMilliseconds(server.db[dbid].expires,1); return 1; /* already used our millisecond for this loop... */ } return 0;}
(4)dict.c/dictRehashMilliseconds函数源码
/* Rehash for an amount of time between ms milliseconds and ms+1 milliseconds */int dictRehashMilliseconds(dict *d, int ms) {long long start = timeInMilliseconds();int rehashes = 0;while(dictRehash(d,100)) {rehashes += 100;if (timeInMilliseconds()-start > ms) break;}return rehashes;}
(5)dictRehash函数源码在上面已经贴出。
(6)分析
在serverCron中,当没有后台子线程时,会先调用databasesCron函数,再调用incrementallyRehash函数,然后调用dictRehashMilliseconds函数,最终调用dictRehash函数。
incrementallyRehash的时间较长,rehash的个数也比较多。这里每次执行 1 millisecond rehash 操作;如果未完成 rehash,会在下一个 loop 里面继续执行。
本人才疏学浅,若有错,请指出,谢谢!
如果你有更好的建议,可以留言我们一起讨论,共同进步!
衷心的感谢您能耐心的读完本篇博文!
参考书籍:《Redis设计与实现(第二版)》—黄健宏
参考链接:Redis——resize源码解析
- 【Redis源码】rehash源码剖析
- redis源码分析-rehash过程详解
- Redis的字典(dict)rehash过程源码解析
- Redis的字典(dict)rehash过程源码解析
- Redis源码剖析
- 【Redis源码剖析】
- 【Redis源码剖析】
- 【Redis源码剖析】
- 【Redis源码剖析】
- 【Redis源码剖析】
- 【Redis源码剖析】
- 【Redis源码剖析】
- 【Redis源码剖析】
- 【Redis源码剖析】
- 【Redis源码剖析】
- 【Redis源码剖析】
- 【Redis源码剖析】
- 【Redis源码剖析】
- JAVA中的单例设计(Singleton)模式
- HDU 1257 最少拦截系统 (贪心)
- 字符串的处理
- 怎么在地图上标注自己的店铺
- 机器学习
- 【Redis源码】rehash源码剖析
- 猫鼠两题:捉猫&&老鼠
- BZOJ2748_音量调节_KEY
- 【HDU-1702】 ACboy needs your help again!
- 2017 Multi-University Training Contest
- linux文件系统-ext 与xfs
- phpstudy升级MySQL5.7.19
- 机器学习2
- Android 利用RectF画一个类似聊天框弹出样式