【Redis源码】rehash源码剖析

来源：互联网发布：香蕉网络电视免费tv 编辑：程序博客网时间：2024/06/08 00:00

如果你对rehash还不是很了解，请参考：图解rehash

1.rehash的分类

Redis为了兼顾性能的考虑，分为lazy和active的两种rehash操作，同时进行，直到rehash完成。

lazy rehashing：在每次对dict进行操作的时候执行一个slot的rehash
active rehashing：每100ms里面使用1ms时间进行rehash。（serverCron函数）

2.lazy rehashing

函数的调用过程：_dictRehashStep–> dictRehash

（1）dict.c/_dictRehashStep函数源码如下

/* This function performs just a step of rehashing, and only if there are * no safe iterators bound to our hash table. When we have iterators in the * middle of a rehashing we can't mess with the two hash tables otherwise * some element can be missed or duplicated. * * 在字典不存在安全迭代器的情况下，对字典进行单步 rehash 。 * * 字典有安全迭代器的情况下不能进行 rehash ， * 因为两种不同的迭代和修改操作可能会弄乱字典。 * * This function is called by common lookup or update operations in the * dictionary so that the hash table automatically migrates from H1 to H2 * while it is actively used.  * * 这个函数被多个通用的查找、更新操作调用， * 它可以让字典在被使用的同时进行 rehash 。 * * T = O(1) */static void _dictRehashStep(dict *d) {    if (d->iterators == 0) dictRehash(d,1);}

（2）dict.c/dictRehash函数源码如下

/* Performs N steps of incremental rehashing. Returns 1 if there are still * keys to move from the old to the new hash table, otherwise 0 is returned. * * 执行 N 步渐进式 rehash 。 * * 返回 1 表示仍有键需要从 0 号哈希表移动到 1 号哈希表， * 返回 0 则表示所有键都已经迁移完毕。 * * Note that a rehashing step consists in moving a bucket (that may have more * than one key as we use chaining) from the old to the new hash table. * * 注意，每步 rehash 都是以一个哈希表索引（桶）作为单位的， * 一个桶里可能会有多个节点， * 被 rehash 的桶里的所有节点都会被移动到新哈希表。 * * T = O(N) */int dictRehash(dict *d, int n) {    // 只可以在 rehash 进行中时执行    if (!dictIsRehashing(d)) return 0;    // 进行 N 步迁移    // T = O(N)    while(n--) {        dictEntry *de, *nextde;        /* Check if we already rehashed the whole table... */        // 如果 0 号哈希表为空，那么表示 rehash 执行完毕        // T = O(1)        if (d->ht[0].used == 0) {            // 释放 0 号哈希表            zfree(d->ht[0].table);            // 将原来的 1 号哈希表设置为新的 0 号哈希表            d->ht[0] = d->ht[1];            // 重置旧的 1 号哈希表            _dictReset(&d->ht[1]);            // 关闭 rehash 标识            d->rehashidx = -1;            // 返回 0 ，向调用者表示 rehash 已经完成            return 0;        }        /* Note that rehashidx can't overflow as we are sure there are more         * elements because ht[0].used != 0 */        // 确保 rehashidx 没有越界        assert(d->ht[0].size > (unsigned)d->rehashidx);        // 略过数组中为空的索引，找到下一个非空索引        while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;        // 指向该索引的链表表头节点        de = d->ht[0].table[d->rehashidx];        /* Move all the keys in this bucket from the old to the new hash HT */        // 将链表中的所有节点迁移到新哈希表        // T = O(1)        while(de) {            unsigned int h;            // 保存下个节点的指针            nextde = de->next;            /* Get the index in the new hash table */            // 计算新哈希表的哈希值，以及节点插入的索引位置            h = dictHashKey(d, de->key) & d->ht[1].sizemask;            // 插入节点到新哈希表            de->next = d->ht[1].table[h];            d->ht[1].table[h] = de;            // 更新计数器            d->ht[0].used--;            d->ht[1].used++;            // 继续处理下个节点            de = nextde;        }        // 将刚迁移完的哈希表索引的指针设为空        d->ht[0].table[d->rehashidx] = NULL;        // 更新 rehash 索引        d->rehashidx++;    }    return 1;}

（3）源码分析

在_dictRehashStep函数中，会调用dictRehash方法，而_dictRehashStep每次仅会rehash一个值从ht[0]到 ht[1]，但由于_dictRehashStep是被dictGetRandomKey、dictFind、 dictGenericDelete、dictAdd调用的，因此在每次dict增删查改时都会被调用，这无疑就加快了rehash过程。
在dictRehash函数中每次增量rehash n个元素，由于在自动调整大小时已设置好了ht[1]的大小，因此rehash的主要过程就是遍历ht[0]，取得key，然后将该key按ht[1]的桶的大小重新rehash，并在rehash完后将ht[0]指向ht[1],然后将ht[1]清空。在这个过程中rehashidx非常重要，它表示上次rehash时在ht[0]的下标位置。

3.active rehashing

函数调用的过程如下：
serverCron->databasesCron–>incrementallyRehash->dictRehashMilliseconds->dictRehash

（1）redis.c/serverCron函数的源码

/* This is our timer interrupt, called server.hz times per second. * * 这是 Redis 的时间中断器，每秒调用 server.hz 次。 * * Here is where we do a number of things that need to be done asynchronously. * For instance: * * 以下是需要异步执行的操作： * * - Active expired keys collection (it is also performed in a lazy way on *   lookup). *   主动清除过期键。 * * - Software watchdog. *   更新软件 watchdog 的信息。 * * - Update some statistic. *   更新统计信息。 * * - Incremental rehashing of the DBs hash tables. *   对数据库进行渐增式 Rehash * * - Triggering BGSAVE / AOF rewrite, and handling of terminated children. *   触发 BGSAVE 或者 AOF 重写，并处理之后由 BGSAVE 和 AOF 重写引发的子进程停止。 * * - Clients timeout of different kinds. *   处理客户端超时。 * * - Replication reconnection. *   复制重连 * * - Many more... *   等等。。。 * * Everything directly called here will be called server.hz times per second, * so in order to throttle execution of things we want to do less frequently * a macro is used: run_with_period(milliseconds) { .... } * * 因为 serverCron 函数中的所有代码都会每秒调用 server.hz 次， * 为了对部分代码的调用次数进行限制， * 使用了一个宏 run_with_period(milliseconds) { ... } ， * 这个宏可以将被包含代码的执行次数降低为每 milliseconds 执行一次。 */int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {    int j;    REDIS_NOTUSED(eventLoop);    REDIS_NOTUSED(id);    REDIS_NOTUSED(clientData);    /* Software watchdog: deliver the SIGALRM that will reach the signal     * handler if we don't return here fast enough. */    if (server.watchdog_period) watchdogScheduleSignal(server.watchdog_period);    /* Update the time cache. */    updateCachedTime();    // 记录服务器执行命令的次数    run_with_period(100) trackOperationsPerSecond();    /* We have just REDIS_LRU_BITS bits per object for LRU information.     * So we use an (eventually wrapping) LRU clock.     *     * Note that even if the counter wraps it's not a big problem,     * everything will still work but some object will appear younger     * to Redis. However for this to happen a given object should never be     * touched for all the time needed to the counter to wrap, which is     * not likely.     *     * 即使服务器的时间最终比 1.5 年长也无所谓，     * 对象系统仍会正常运作，不过一些对象可能会比服务器本身的时钟更年轻。     * 不过这要这个对象在 1.5 年内都没有被访问过，才会出现这种现象。     *     * Note that you can change the resolution altering the     * REDIS_LRU_CLOCK_RESOLUTION define.     *     * LRU 时间的精度可以通过修改 REDIS_LRU_CLOCK_RESOLUTION 常量来改变。     */    server.lruclock = getLRUClock();    /* Record the max memory used since the server was started. */    // 记录服务器的内存峰值    if (zmalloc_used_memory() > server.stat_peak_memory)        server.stat_peak_memory = zmalloc_used_memory();    /* Sample the RSS here since this is a relatively slow call. */    server.resident_set_size = zmalloc_get_rss();    /* We received a SIGTERM, shutting down here in a safe way, as it is     * not ok doing so inside the signal handler. */    // 服务器进程收到 SIGTERM 信号，关闭服务器    if (server.shutdown_asap) {        // 尝试关闭服务器        if (prepareForShutdown(0) == REDIS_OK) exit(0);        // 如果关闭失败，那么打印 LOG ，并移除关闭标识        redisLog(REDIS_WARNING,"SIGTERM received but errors trying to shut down the server, check the logs for more information");        server.shutdown_asap = 0;    }    /* Show some info about non-empty databases */    // 打印数据库的键值对信息    run_with_period(5000) {        for (j = 0; j < server.dbnum; j++) {            long long size, used, vkeys;            // 可用键值对的数量            size = dictSlots(server.db[j].dict);            // 已用键值对的数量            used = dictSize(server.db[j].dict);            // 带有过期时间的键值对数量            vkeys = dictSize(server.db[j].expires);            // 用 LOG 打印数量            if (used || vkeys) {                redisLog(REDIS_VERBOSE,"DB %d: %lld keys (%lld volatile) in %lld slots HT.",j,used,vkeys,size);                /* dictPrintStats(server.dict); */            }        }    }    /* Show information about connected clients */    // 如果服务器没有运行在 SENTINEL 模式下，那么打印客户端的连接信息    if (!server.sentinel_mode) {        run_with_period(5000) {            redisLog(REDIS_VERBOSE,                "%lu clients connected (%lu slaves), %zu bytes in use",                listLength(server.clients)-listLength(server.slaves),                listLength(server.slaves),                zmalloc_used_memory());        }    }    /* We need to do a few operations on clients asynchronously. */    // 检查客户端，关闭超时客户端，并释放客户端多余的缓冲区    clientsCron();    /* Handle background operations on Redis databases. */    // 对数据库执行各种操作    databasesCron();    /* Start a scheduled AOF rewrite if this was requested by the user while     * a BGSAVE was in progress. */    // 如果 BGSAVE 和 BGREWRITEAOF 都没有在执行    // 并且有一个 BGREWRITEAOF 在等待，那么执行 BGREWRITEAOF    if (server.rdb_child_pid == -1 && server.aof_child_pid == -1 &&        server.aof_rewrite_scheduled)    {        rewriteAppendOnlyFileBackground();    }    /* Check if a background saving or AOF rewrite in progress terminated. */    // 检查 BGSAVE 或者 BGREWRITEAOF 是否已经执行完毕    if (server.rdb_child_pid != -1 || server.aof_child_pid != -1) {        int statloc;        pid_t pid;        // 接收子进程发来的信号，非阻塞        if ((pid = wait3(&statloc,WNOHANG,NULL)) != 0) {            int exitcode = WEXITSTATUS(statloc);            int bysignal = 0;            if (WIFSIGNALED(statloc)) bysignal = WTERMSIG(statloc);            // BGSAVE 执行完毕            if (pid == server.rdb_child_pid) {                backgroundSaveDoneHandler(exitcode,bysignal);            // BGREWRITEAOF 执行完毕            } else if (pid == server.aof_child_pid) {                backgroundRewriteDoneHandler(exitcode,bysignal);            } else {                redisLog(REDIS_WARNING,                    "Warning, detected child with unmatched pid: %ld",                    (long)pid);            }            updateDictResizePolicy();        }    } else {        /* If there is not a background saving/rewrite in progress check if         * we have to save/rewrite now */        // 既然没有 BGSAVE 或者 BGREWRITEAOF 在执行，那么检查是否需要执行它们        // 遍历所有保存条件，看是否需要执行 BGSAVE 命令         for (j = 0; j < server.saveparamslen; j++) {            struct saveparam *sp = server.saveparams+j;            /* Save if we reached the given amount of changes,             * the given amount of seconds, and if the latest bgsave was             * successful or if, in case of an error, at least             * REDIS_BGSAVE_RETRY_DELAY seconds already elapsed. */            // 检查是否有某个保存条件已经满足了            if (server.dirty >= sp->changes &&                server.unixtime-server.lastsave > sp->seconds &&                (server.unixtime-server.lastbgsave_try >                 REDIS_BGSAVE_RETRY_DELAY ||                 server.lastbgsave_status == REDIS_OK))            {                redisLog(REDIS_NOTICE,"%d changes in %d seconds. Saving...",                    sp->changes, (int)sp->seconds);                // 执行 BGSAVE                rdbSaveBackground(server.rdb_filename);                break;            }         }         /* Trigger an AOF rewrite if needed */        // 出发 BGREWRITEAOF         if (server.rdb_child_pid == -1 &&             server.aof_child_pid == -1 &&             server.aof_rewrite_perc &&             // AOF 文件的当前大小大于执行 BGREWRITEAOF 所需的最小大小             server.aof_current_size > server.aof_rewrite_min_size)         {            // 上一次完成 AOF 写入之后，AOF 文件的大小            long long base = server.aof_rewrite_base_size ?                            server.aof_rewrite_base_size : 1;            // AOF 文件当前的体积相对于 base 的体积的百分比            long long growth = (server.aof_current_size*100/base) - 100;            // 如果增长体积的百分比超过了 growth ，那么执行 BGREWRITEAOF            if (growth >= server.aof_rewrite_perc) {                redisLog(REDIS_NOTICE,"Starting automatic rewriting of AOF on %lld%% growth",growth);                // 执行 BGREWRITEAOF                rewriteAppendOnlyFileBackground();            }         }    }    // 根据 AOF 政策，    // 考虑是否需要将 AOF 缓冲区中的内容写入到 AOF 文件中    /* AOF postponed flush: Try at every cron cycle if the slow fsync     * completed. */    if (server.aof_flush_postponed_start) flushAppendOnlyFile(0);    /* AOF write errors: in this case we have a buffer to flush as well and     * clear the AOF error in case of success to make the DB writable again,     * however to try every second is enough in case of 'hz' is set to     * an higher frequency. */    run_with_period(1000) {        if (server.aof_last_write_status == REDIS_ERR)            flushAppendOnlyFile(0);    }    /* Close clients that need to be closed asynchronous */    // 关闭那些需要异步关闭的客户端    freeClientsInAsyncFreeQueue();    /* Clear the paused clients flag if needed. */    clientsArePaused(); /* Don't check return value, just use the side effect. */    /* Replication cron function -- used to reconnect to master and     * to detect transfer failures. */    // 复制函数    // 重连接主服务器、向主服务器发送 ACK 、判断数据发送失败情况、断开本服务器超时的从服务器，等等    run_with_period(1000) replicationCron();    /* Run the Redis Cluster cron. */    // 如果服务器运行在集群模式下，那么执行集群操作    run_with_period(100) {        if (server.cluster_enabled) clusterCron();    }    /* Run the Sentinel timer if we are in sentinel mode. */    // 如果服务器运行在 sentinel 模式下，那么执行 SENTINEL 的主函数    run_with_period(100) {        if (server.sentinel_mode) sentinelTimer();    }    /* Cleanup expired MIGRATE cached sockets. */    // 集群。。。TODO    run_with_period(1000) {        migrateCloseTimedoutSockets();    }    // 增加 loop 计数器    server.cronloops++;    return 1000/server.hz;}

（2）redis.c/databasesCron函数源码如下

/* This function handles 'background' operations we are required to do * incrementally in Redis databases, such as active key expiring, resizing, * rehashing. */// 对数据库执行删除过期键，调整大小，以及主动和渐进式 rehashvoid databasesCron(void) {    // 函数先从数据库中删除过期键，然后再对数据库的大小进行修改    /* Expire keys by random sampling. Not required for slaves     * as master will synthesize DELs for us. */    // 如果服务器不是从服务器，那么执行主动过期键清除    if (server.active_expire_enabled && server.masterhost == NULL)        // 清除模式为 CYCLE_SLOW ，这个模式会尽量多清除过期键        activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);    /* Perform hash tables rehashing if needed, but only if there are no     * other processes saving the DB on disk. Otherwise rehashing is bad     * as will cause a lot of copy-on-write of memory pages. */    // 在没有 BGSAVE 或者 BGREWRITEAOF 执行时，对哈希表进行 rehash    if (server.rdb_child_pid == -1 && server.aof_child_pid == -1) {        /* We use global counters so if we stop the computation at a given         * DB we'll be able to start from the successive in the next         * cron loop iteration. */        static unsigned int resize_db = 0;        static unsigned int rehash_db = 0;        unsigned int dbs_per_call = REDIS_DBCRON_DBS_PER_CALL;        unsigned int j;        /* Don't test more DBs than we have. */        // 设定要测试的数据库数量        if (dbs_per_call > server.dbnum) dbs_per_call = server.dbnum;        /* Resize */        // 调整字典的大小        for (j = 0; j < dbs_per_call; j++) {            tryResizeHashTables(resize_db % server.dbnum);            resize_db++;        }        /* Rehash */        // 对字典进行渐进式 rehash        if (server.activerehashing) {            for (j = 0; j < dbs_per_call; j++) {                int work_done = incrementallyRehash(rehash_db % server.dbnum);                rehash_db++;                if (work_done) {                    /* If the function did some work, stop here, we'll do                     * more at the next cron loop. */                    break;                }            }        }    }}

（3）redis.c/incrementallyRehash函数源码

/* Our hash table implementation performs rehashing incrementally while * we write/read from the hash table. Still if the server is idle, the hash * table will use two tables for a long time. So we try to use 1 millisecond * of CPU time at every call of this function to perform some rehahsing. * * 虽然服务器在对数据库执行读取/写入命令时会对数据库进行渐进式 rehash ， * 但如果服务器长期没有执行命令的话，数据库字典的 rehash 就可能一直没办法完成， * 为了防止出现这种情况，我们需要对数据库执行主动 rehash 。 * * The function returns 1 if some rehashing was performed, otherwise 0 * is returned.  * * 函数在执行了主动 rehash 时返回 1 ，否则返回 0 。 */int incrementallyRehash(int dbid) {    /* Keys dictionary */    if (dictIsRehashing(server.db[dbid].dict)) {        dictRehashMilliseconds(server.db[dbid].dict,1);        return 1; /* already used our millisecond for this loop... */    }    /* Expires */    if (dictIsRehashing(server.db[dbid].expires)) {        dictRehashMilliseconds(server.db[dbid].expires,1);        return 1; /* already used our millisecond for this loop... */    }    return 0;}

（4）dict.c/dictRehashMilliseconds函数源码

/* Rehash for an amount of time between ms milliseconds and ms+1 milliseconds */int dictRehashMilliseconds(dict *d, int ms) {long long start = timeInMilliseconds();int rehashes = 0;while(dictRehash(d,100)) {rehashes += 100;if (timeInMilliseconds()-start &gt; ms) break;}return rehashes;}

（5）dictRehash函数源码在上面已经贴出。

（6）分析

在serverCron中，当没有后台子线程时，会先调用databasesCron函数，再调用incrementallyRehash函数，然后调用dictRehashMilliseconds函数，最终调用dictRehash函数。
incrementallyRehash的时间较长，rehash的个数也比较多。这里每次执行 1 millisecond rehash 操作；如果未完成 rehash，会在下一个 loop 里面继续执行。

本人才疏学浅，若有错，请指出，谢谢！
如果你有更好的建议，可以留言我们一起讨论，共同进步！
衷心的感谢您能耐心的读完本篇博文！

参考书籍：《Redis设计与实现（第二版）》—黄健宏
参考链接：Redis——resize源码解析

阅读全文

0 0