Redis之AOF文件持久化

来源：互联网发布：mac能用六维空间吗编辑：程序博客网时间：2024/06/03 22:47

Redis之AOF文件持久化

一、AOF持久化介绍

Redis除了使用RDB文件持久化数据库外，还提供了AOF持久化功能，与RDB持久化的区别如下：

(1) RDB持久化通过保存数据库中键值对来记录数据库的状态，AOF持久化是通过记录服务器所执行的命令来保存数据库的状态的

这里写图片描述

(2) AOF文件的更新频率要高于RDB文件，所以如果服务器开启了AOF文件持久化功能，那么服务器会优先使用AOF文件进行还原数据库的状态，只有在AOF文件持久化功能被关闭的时候服务器才使用RDB文件来还原数据库

这里写图片描述

二、AOF持久化的大体实现过程

AOF文件持久化功能的实现大体分为四个过程，分别是：

（1）根据协议构造存储的格式；（2）命令追加；（3）文件写入；（4）文件同步；

2.1 根据协议构造存储的命令格式

被写入到AOF文件中的命令都是纯文本格式。相比于RDB文件的存储格式，AOF文件的存储格式要简单得多，对于AOF文件中的一条命令，其保存的格式如下：

*<count>    // <count>表示该命令有2个参数$<len>     // <len>表示第1个参数的长度<content>   // <content>表示第1个参数的内容$<len>     // <len>表示第2个参数的长度<content>   // <content>表示第2个参数的内容...

构造命令格式的代码如下:

/* 根据输入的字符串，进行参数包装，再次输出 */sds catAppendOnlyGenericCommand(sds dst, int argc, robj **argv) {    char buf[32];    int len, j;    robj *o;    buf[0] = '*';    // 得到命令参数的个数，ll2string是将long类型转化为string并返回长度    len = 1+ll2string(buf+1,sizeof(buf)-1,argc);    buf[len++] = '\r';    buf[len++] = '\n';    // 追加字符到末尾    dst = sdscatlen(dst,buf,len);    for (j = 0; j < argc; j++) {        // 获取解码后的robj        o = getDecodedObject(argv[j]);        buf[0] = '$';        len = 1+ll2string(buf+1,sizeof(buf)-1,sdslen(o->ptr));        buf[len++] = '\r';        buf[len++] = '\n';        dst = sdscatlen(dst,buf,len);        dst = sdscatlen(dst,o->ptr,sdslen(o->ptr));        dst = sdscatlen(dst,"\r\n",2);        // 递减robj中的引用计数，引用到0后，释放对象        decrRefCount(o);    }    return dst;}

2.2 命令追加

当AOF文件持久化功能打开后，服务器在执行完一个命令后，按照保存的格式将命令追加到aof_buf缓冲区末尾，

struct redisServer{    sds aof_buf;}

/* 根据cmd的不同操作，进行命令的不同转化 */void feedAppendOnlyFile(struct redisCommand *cmd, int dictid, robj **argv, int argc) {    // 其实就是创建一个长度为0的空字符串    sds buf = sdsempty();    robj *tmpargv[3];    /* The DB this command was targeting is not the same as the last command     * we appendend. To issue a SELECT command is needed. */    // 如果当前命令涉及的数据库与server.aof_selected_db指明的数据库不一致，需要加入SELECT命令显式设置    if (dictid != server.aof_selected_db) {        char seldb[64];        snprintf(seldb,sizeof(seldb),"%d",dictid);        buf = sdscatprintf(buf,"*2\r\n$6\r\nSELECT\r\n$%lu\r\n%s\r\n",            (unsigned long)strlen(seldb),seldb);        server.aof_selected_db = dictid;    }    // 将过期等的命令都转化为PEXPIREAT命令，把时间转化为了绝对时间    if (cmd->proc == expireCommand || cmd->proc == pexpireCommand ||        cmd->proc == expireatCommand) {        /* Translate EXPIRE/PEXPIRE/EXPIREAT into PEXPIREAT */        buf = catAppendOnlyExpireAtCommand(buf,cmd,argv[1],argv[2]);    } else if (cmd->proc == setexCommand || cmd->proc == psetexCommand) {        /* Translate SETEX/PSETEX to SET and PEXPIREAT */        tmpargv[0] = createStringObject("SET",3);        tmpargv[1] = argv[1];        tmpargv[2] = argv[3];        // 根据输入的字符串，进行参数包装，再次输出        buf = catAppendOnlyGenericCommand(buf,3,tmpargv);        decrRefCount(tmpargv[0]);        buf = catAppendOnlyExpireAtCommand(buf,cmd,argv[1],argv[2]);    } else {        /* All the other commands don't need translation or need the         * same translation already operated in the command vector         * for the replication itself. */        buf = catAppendOnlyGenericCommand(buf,argc,argv);    }    /* Append to the AOF buffer. This will be flushed on disk just before     * of re-entering the event loop, so before the client will get a     * positive reply about the operation performed. */     // 将重构后的命令字符串追加到AOF缓冲区中。AOF缓冲区中的数据会在重新进入事件循环前写入磁盘中，相应的客户端也会受到一个关于此次操作的回复消息    if (server.aof_state == REDIS_AOF_ON)        server.aof_buf = sdscatlen(server.aof_buf,buf,sdslen(buf));    /* If a background append only file rewriting is in progress we want to     * accumulate the differences between the child DB and the current one     * in a buffer, so that when the child process will do its work we     * can append the differences to the new append only file. */    // 如果后台正在执行AOF文件重写操作（即BGREWRITEAOF命令），为了记录当前正在重写的AOF文件和当前数据库的    // 差异信息，我们还需要将重构后的命令追加到AOF重写缓存中。    if (server.aof_child_pid != -1) // AOF重是通过派生子进程完成的，后面会介绍到        aofRewriteBufferAppend((unsigned char*)buf,sdslen(buf));    // 释放临时缓冲区的空间    sdsfree(buf);}

2.3 文件写入与同步

我们知道当用户调用write函数将一些数据写入到文件的时候，操作系统通常会将写入数据暂时保存在一个内存缓冲区里面，等到缓冲区的空间被填满，或者超过了指定的时限，或者内核需要重用缓冲区存放其它磁盘块数据时，才会真正将缓冲区中的所有数据写入到磁盘里面，这种方式称为延迟写。

这种做法虽然提高了效率，但也为写入数据带来了安全问题，如果计算机停机，则保存在缓冲区中的写入数据将丢失。为了保持一致性，即向文件写入数据立即真正的写入到磁盘上的文件中，而不是先写到内存缓冲区里面，则我们需要采取文件同步。

Redis服务器的进程本身就是一个事件循环，这个循环中的文件事件，负责接收客户端的命令请求和回复客户端，因此在每个事件过程中，Redis服务器都有可能执行写命令，因此在结束一个事件循环之前，Redis都会调用flushAppendOnlyFile将缓冲区的aof_buf的内容写入文件里面
每次redis进入event循环准备执行这个event时，会调用beforeSleep方法

void aeMain(aeEventLoop *eventLoop) {    eventLoop->stop = 0;    while (!eventLoop->stop) {        if (eventLoop->beforesleep != NULL)            eventLoop->beforesleep(eventLoop);        aeProcessEvents(eventLoop, AE_ALL_EVENTS);    }}/* This function gets called every time Redis is entering the * main loop of the event driven library, that is, before to sleep * for ready file descriptors. */void beforeSleep(struct aeEventLoop *eventLoop) {    ......    /* Write the AOF buffer on disk */    flushAppendOnlyFile(0);    ......}

Redis可以通过配置redis.conf文件中的flush选项来指定AOF同步策略，主要支持以下三种同步策略：
aof_fsync == AOF_FSYNC_EVERYSEC 每秒同步一次

aof_fsync == AOF_FSYNC_ALWAYS 每次事件循环写操作后都执行同步

aof_fsync == AOF_FSYNC_NO 不同步，让操作系统来决定何时同步

AOF_FSYNC_ALWAYS
表示每次事件循环后将aof_buf缓冲区内容写入文件，并且都要执行同步操作，所以效率相比其他两者是最低的，但是从数据的安全角度来说，是最安全的，即使出现故障停机，持久化过程最多也只会丢失一个事件循环中产生的数据

AOF_FSYNC_EVERYSEC
表示每次事件循环后将aof_buf缓冲区内容写入文件，并且每隔一秒再进行文件同步，这个同步操作的过程是在子线程中完成的，出现故障停机，可能会丢失一秒钟内产生的数据

AOF_FSYNC_NO
表示每次事件循环后将aof_buf缓冲区内容写入文件，至于何时将数据从内核缓冲区刷新到磁盘，由操作系统决定，安全性最差

/* 刷新缓存区的内容到磁盘中 */void flushAppendOnlyFile(int force) {    ssize_t nwritten;    int sync_in_progress = 0;    mstime_t latency;    // 如果缓冲区没有数据，则无需写入同步    if (sdslen(server.aof_buf) == 0) return;    // 如果当前的同步模式为AOF_FSYNC_EVERYSEC;即每秒同步一次    if (server.aof_fsync == AOF_FSYNC_EVERYSEC)        // 获取当前正在等待的同步任务个数        sync_in_progress = bioPendingJobsOfType(REDIS_BIO_AOF_FSYNC) != 0;    // 如果当前的同步模式为AOF_FSYNC_EVERYSEC;且是非强制的    if (server.aof_fsync == AOF_FSYNC_EVERYSEC && !force) {        /* With this append fsync policy we do background fsyncing.         * If the fsync is still in progress we can try to delay         * the write for a couple of seconds. */        if (sync_in_progress) {            // 如果本身aof_flush_postponed_start还是0，表明上次fsync还没有完成，所以我们这次延迟一下本次write,fsync            if (server.aof_flush_postponed_start == 0) {                /* No previous write postponinig, remember that we are                 * postponing the flush and return. */                server.aof_flush_postponed_start = server.unixtime;                return;            // 如果上次的fsync延迟时间小于2，继续延迟            } else if (server.unixtime - server.aof_flush_postponed_start < 2) {                /* We were already waiting for fsync to finish, but for less                 * than two seconds this is still ok. Postpone again. */                return;            }            /* Otherwise fall trough, and go write since we can't wait             * over two seconds. */            // 以上两种情况都不满足，则执行真正的同步，延迟的同步计数器加一            server.aof_delayed_fsync++;            redisLog(REDIS_NOTICE,"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.");        }    }    /* We want to perform a single write. This should be guaranteed atomic     * at least if the filesystem we are writing is a real physical one.     * While this will save us against the server being killed I don't think     * there is much to do about the whole server stopping for power problems     * or alike */    // 在进行写入操作的时候，还监听了延迟，Redis对中很多简单的资源进行延迟采样监听，比如I/O磁盘操作，执行一些指令，    latencyStartMonitor(latency);    nwritten = write(server.aof_fd,server.aof_buf,sdslen(server.aof_buf));    latencyEndMonitor(latency);    /* We want to capture different events for delayed writes:     * when the delay happens with a pending fsync, or with a saving child     * active, and when the above two conditions are missing.     * We also use an additional event name to save all samples which is     * useful for graphing / monitoring purposes. */     // 记录相关的延迟采样    if (sync_in_progress) {        latencyAddSampleIfNeeded("aof-write-pending-fsync",latency);    } else if (server.aof_child_pid != -1 || server.rdb_child_pid != -1) {        latencyAddSampleIfNeeded("aof-write-active-child",latency);    } else {        latencyAddSampleIfNeeded("aof-write-alone",latency);    }    latencyAddSampleIfNeeded("aof-write",latency);    /* We performed the write so reset the postponed flush sentinel to zero. */    server.aof_flush_postponed_start = 0;    // 如果写入的数据总长度不是所期望的，也就是出现了错误，以下操作就是记录错误日志和恢复处理    if (nwritten != (signed)sdslen(server.aof_buf)) {        static time_t last_write_error_log = 0;        int can_log = 0;        /* Limit logging rate to 1 line per AOF_WRITE_LOG_ERROR_RATE seconds. */        if ((server.unixtime - last_write_error_log) > AOF_WRITE_LOG_ERROR_RATE) {            can_log = 1;            last_write_error_log = server.unixtime;        }        /* Lof the AOF write error and record the error code. */        if (nwritten == -1) {            if (can_log) {                redisLog(REDIS_WARNING,"Error writing to the AOF file: %s",                    strerror(errno));                server.aof_last_write_errno = errno;            }        } else {            if (can_log) {                redisLog(REDIS_WARNING,"Short write while writing to "                                       "the AOF file: (nwritten=%lld, "                                       "expected=%lld)",                                       (long long)nwritten,                                       (long long)sdslen(server.aof_buf));            }            if (ftruncate(server.aof_fd, server.aof_current_size) == -1) {                if (can_log) {                    redisLog(REDIS_WARNING, "Could not remove short write "                             "from the append-only file.  Redis may refuse "                             "to load the AOF the next time it starts.  "                             "ftruncate: %s", strerror(errno));                }            } else {                /* If the ftrunacate() succeeded we can set nwritten to                 * -1 since there is no longer partial data into the AOF. */                nwritten = -1;            }            server.aof_last_write_errno = ENOSPC;        }        /* Handle the AOF write error. */        // 如果是AOF_FSYNC_ALWAYS模式下出错，直接退出进程，交给用户处理        if (server.aof_fsync == AOF_FSYNC_ALWAYS) {            /* We can't recover when the fsync policy is ALWAYS since the             * reply for the client is already in the output buffers, and we             * have the contract with the user that on acknowledged write data             * is synched on disk. */            redisLog(REDIS_WARNING,"Can't recover from AOF write error when the AOF fsync policy is 'always'. Exiting...");            exit(1);        } else {            /* Recover from failed write leaving data into the buffer. However             * set an error to stop accepting writes as long as the error             * condition is not cleared. */            server.aof_last_write_status = REDIS_ERR;            /* Trim the sds buffer if there was a partial write, and there             * was no way to undo it with ftruncate(2). */            // 更新aof_buf，使用原来aof_buf从nwritten开始的字节内容替换aof_buf            if (nwritten > 0) {                server.aof_current_size += nwritten;                sdsrange(server.aof_buf,nwritten,-1);            }            return; /* We'll try again on the next call... */        }    } else {        // 没有出错，记录日志，状态        /* Successful write(2). If AOF was in error state, restore the         * OK state and log the event. */        if (server.aof_last_write_status == REDIS_ERR) {            redisLog(REDIS_WARNING,                "AOF write error looks solved, Redis can write again.");            server.aof_last_write_status = REDIS_OK;        }    }    // 表示缓冲区已写入，无可写字节    server.aof_current_size += nwritten;    /* Re-use AOF buffer when it is small enough. The maximum comes from the     * arena size of 4k minus some overhead (but is otherwise arbitrary). */    if ((sdslen(server.aof_buf)+sdsavail(server.aof_buf)) < 4000) {        sdsclear(server.aof_buf);    } else {        sdsfree(server.aof_buf);        server.aof_buf = sdsempty();    }    /* Don't fsync if no-appendfsync-on-rewrite is set to yes and there are     * children doing I/O in the background. */    if (server.aof_no_fsync_on_rewrite &&        (server.aof_child_pid != -1 || server.rdb_child_pid != -1))            return;    /* Perform the fsync if needed. */    if (server.aof_fsync == AOF_FSYNC_ALWAYS) {        /* aof_fsync is defined as fdatasync() for Linux in order to avoid         * flushing metadata. */        latencyStartMonitor(latency);        // 如果采用AOF_FSYNC_ALWAYS的配置，则调用系统的fdatasync函数进行同步        aof_fsync(server.aof_fd); /* Let's try to get this data on the disk */        latencyEndMonitor(latency);        latencyAddSampleIfNeeded("aof-fsync-always",latency);        server.aof_last_fsync = server.unixtime;    } else if ((server.aof_fsync == AOF_FSYNC_EVERYSEC &&                server.unixtime > server.aof_last_fsync)) {        // 如果采用 AOF_FSYNC_EVERYSEC配置，则使用后台线程每隔1秒进行同步操作，最终调用的还是Linux系统的fsync        if (!sync_in_progress) aof_background_fsync(server.aof_fd);        server.aof_last_fsync = server.unixtime;    }}

三、AOF文件持久化的过程梳理

经过以上分析，我们大致对AOF文件的持久化操作有了一定的了解，下面再来总结和梳理一下整个过程：
首先，redis的main函数最后会调用一个事件循环处理函数：

/* ae事件执行主程序 */void aeMain(aeEventLoop *eventLoop) {    eventLoop->stop = 0;    //如果eventLoop中的stop标志位不为1，就循环处理    while (!eventLoop->stop) {        //每次eventLoop事件执行完后又重新开始执行时调用        if (eventLoop->beforesleep != NULL)            eventLoop->beforesleep(eventLoop);        //while循环处理所有的evetLoop的事件        aeProcessEvents(eventLoop, AE_ALL_EVENTS);    }}

aeProcessEvents 之后的调用过程：

aeProcessEvents —>readQueryFromClient—>processInputBuffer—>processCommand—>call—>propagate—>feedAppendOnlyFile(cmd,dbid,argv,argc)将数据写入缓冲区aof_buf

aeProcessEvents 处理事件结束后，会执行beforesleep函数

beforesleep函数的调用过程：

void beforeSleep(struct aeEventLoop *eventLoop) {    .......    /* Write the AOF buffer on disk */    flushAppendOnlyFile(0);}

flushAppendOnlyFile函数就是我们在第二节中分析的aof_buf的写入和同步实现了

到此AOF的持久化机制介绍到这里

问题：如果一直这样AOF下去，把所有客户端命令都重放到AOF文件内，势必导致AOF文件非常大，不断增大，而且可能会有很多重复的无用命令，这种情况我们如何解决呢？下篇博文中继续学习关于AOF文件的重写机制，redis使用AOF文件重写机制来解决这种限制

阅读全文

0 0