Redis之AOF文件持久化
来源:互联网 发布:mac能用六维空间吗 编辑:程序博客网 时间:2024/06/03 22:47
Redis之AOF文件持久化
一、AOF持久化介绍
Redis除了使用RDB文件持久化数据库外,还提供了AOF持久化功能,与RDB持久化的区别如下:
(1) RDB持久化通过保存数据库中键值对来记录数据库的状态,AOF持久化是通过记录服务器所执行的命令来保存数据库的状态的
(2) AOF文件的更新频率要高于RDB文件,所以如果服务器开启了AOF文件持久化功能,那么服务器会优先使用AOF文件进行还原数据库的状态,只有在AOF文件持久化功能被关闭的时候服务器才使用RDB文件来还原数据库
二、AOF持久化的大体实现过程
AOF文件持久化功能的实现大体分为四个过程,分别是:
(1) 根据协议构造存储的格式;(2)命令追加;(3)文件写入;(4)文件同步;
2.1 根据协议构造存储的命令格式
被写入到AOF文件中的命令都是纯文本格式。相比于RDB文件的存储格式,AOF文件的存储格式要简单得多,对于AOF文件中的一条命令,其保存的格式如下:
*<count> // <count>表示该命令有2个参数$<len> // <len>表示第1个参数的长度<content> // <content>表示第1个参数的内容$<len> // <len>表示第2个参数的长度<content> // <content>表示第2个参数的内容...
构造命令格式的代码如下:
/* 根据输入的字符串,进行参数包装,再次输出 */sds catAppendOnlyGenericCommand(sds dst, int argc, robj **argv) { char buf[32]; int len, j; robj *o; buf[0] = '*'; // 得到命令参数的个数,ll2string是将long类型转化为string并返回长度 len = 1+ll2string(buf+1,sizeof(buf)-1,argc); buf[len++] = '\r'; buf[len++] = '\n'; // 追加字符到末尾 dst = sdscatlen(dst,buf,len); for (j = 0; j < argc; j++) { // 获取解码后的robj o = getDecodedObject(argv[j]); buf[0] = '$'; len = 1+ll2string(buf+1,sizeof(buf)-1,sdslen(o->ptr)); buf[len++] = '\r'; buf[len++] = '\n'; dst = sdscatlen(dst,buf,len); dst = sdscatlen(dst,o->ptr,sdslen(o->ptr)); dst = sdscatlen(dst,"\r\n",2); // 递减robj中的引用计数,引用到0后,释放对象 decrRefCount(o); } return dst;}
2.2 命令追加
当AOF文件持久化功能打开后,服务器在执行完一个命令后,按照保存的格式将命令追加到aof_buf缓冲区末尾,
struct redisServer{ sds aof_buf;}
/* 根据cmd的不同操作,进行命令的不同转化 */void feedAppendOnlyFile(struct redisCommand *cmd, int dictid, robj **argv, int argc) { // 其实就是创建一个长度为0的空字符串 sds buf = sdsempty(); robj *tmpargv[3]; /* The DB this command was targeting is not the same as the last command * we appendend. To issue a SELECT command is needed. */ // 如果当前命令涉及的数据库与server.aof_selected_db指明的数据库不一致,需要加入SELECT命令显式设置 if (dictid != server.aof_selected_db) { char seldb[64]; snprintf(seldb,sizeof(seldb),"%d",dictid); buf = sdscatprintf(buf,"*2\r\n$6\r\nSELECT\r\n$%lu\r\n%s\r\n", (unsigned long)strlen(seldb),seldb); server.aof_selected_db = dictid; } // 将过期等的命令都转化为PEXPIREAT命令,把时间转化为了绝对时间 if (cmd->proc == expireCommand || cmd->proc == pexpireCommand || cmd->proc == expireatCommand) { /* Translate EXPIRE/PEXPIRE/EXPIREAT into PEXPIREAT */ buf = catAppendOnlyExpireAtCommand(buf,cmd,argv[1],argv[2]); } else if (cmd->proc == setexCommand || cmd->proc == psetexCommand) { /* Translate SETEX/PSETEX to SET and PEXPIREAT */ tmpargv[0] = createStringObject("SET",3); tmpargv[1] = argv[1]; tmpargv[2] = argv[3]; // 根据输入的字符串,进行参数包装,再次输出 buf = catAppendOnlyGenericCommand(buf,3,tmpargv); decrRefCount(tmpargv[0]); buf = catAppendOnlyExpireAtCommand(buf,cmd,argv[1],argv[2]); } else { /* All the other commands don't need translation or need the * same translation already operated in the command vector * for the replication itself. */ buf = catAppendOnlyGenericCommand(buf,argc,argv); } /* Append to the AOF buffer. This will be flushed on disk just before * of re-entering the event loop, so before the client will get a * positive reply about the operation performed. */ // 将重构后的命令字符串追加到AOF缓冲区中。AOF缓冲区中的数据会在重新进入事件循环前写入磁盘中,相应的客户端也会受到一个关于此次操作的回复消息 if (server.aof_state == REDIS_AOF_ON) server.aof_buf = sdscatlen(server.aof_buf,buf,sdslen(buf)); /* If a background append only file rewriting is in progress we want to * accumulate the differences between the child DB and the current one * in a buffer, so that when the child process will do its work we * can append the differences to the new append only file. */ // 如果后台正在执行AOF文件重写操作(即BGREWRITEAOF命令),为了记录当前正在重写的AOF文件和当前数据库的 // 差异信息,我们还需要将重构后的命令追加到AOF重写缓存中。 if (server.aof_child_pid != -1) // AOF重是通过派生子进程完成的,后面会介绍到 aofRewriteBufferAppend((unsigned char*)buf,sdslen(buf)); // 释放临时缓冲区的空间 sdsfree(buf);}
2.3 文件写入与同步
我们知道当用户调用write函数将一些数据写入到文件的时候,操作系统通常会将写入数据暂时保存在一个内存缓冲区里面,等到缓冲区的空间被填满,或者超过了指定的时限,或者内核需要重用缓冲区存放其它磁盘块数据时,才会真正将缓冲区中的所有数据写入到磁盘里面,这种方式称为延迟写。
这种做法虽然提高了效率,但也为写入数据带来了安全问题,如果计算机停机,则保存在缓冲区中的写入数据将丢失。为了保持一致性,即向文件写入数据立即真正的写入到磁盘上的文件中,而不是先写到内存缓冲区里面,则我们需要采取文件同步。
Redis服务器的进程本身就是一个事件循环,这个循环中的文件事件,负责接收客户端的命令请求和回复客户端,因此在每个事件过程中,Redis服务器都有可能执行写命令,因此在结束一个事件循环之前,Redis都会调用flushAppendOnlyFile将缓冲区的aof_buf的内容写入文件里面
每次redis进入event循环准备执行这个event时,会调用beforeSleep方法
void aeMain(aeEventLoop *eventLoop) { eventLoop->stop = 0; while (!eventLoop->stop) { if (eventLoop->beforesleep != NULL) eventLoop->beforesleep(eventLoop); aeProcessEvents(eventLoop, AE_ALL_EVENTS); }}/* This function gets called every time Redis is entering the * main loop of the event driven library, that is, before to sleep * for ready file descriptors. */void beforeSleep(struct aeEventLoop *eventLoop) { ...... /* Write the AOF buffer on disk */ flushAppendOnlyFile(0); ......}
Redis可以通过配置redis.conf文件中的flush选项来指定AOF同步策略,主要支持以下三种同步策略:
aof_fsync == AOF_FSYNC_EVERYSEC 每秒同步一次
aof_fsync == AOF_FSYNC_ALWAYS 每次事件循环写操作后都执行同步
aof_fsync == AOF_FSYNC_NO 不同步,让操作系统来决定何时同步
AOF_FSYNC_ALWAYS
表示每次事件循环后将aof_buf缓冲区内容写入文件,并且都要执行同步操作,所以效率相比其他两者是最低的,但是从数据的安全角度来说,是最安全的,即使出现故障停机,持久化过程最多也只会丢失一个事件循环中产生的数据
AOF_FSYNC_EVERYSEC
表示每次事件循环后将aof_buf缓冲区内容写入文件,并且每隔一秒再进行文件同步,这个同步操作的过程是在子线程中完成的,出现故障停机,可能会丢失一秒钟内产生的数据
AOF_FSYNC_NO
表示每次事件循环后将aof_buf缓冲区内容写入文件,至于何时将数据从内核缓冲区刷新到磁盘,由操作系统决定,安全性最差
/* 刷新缓存区的内容到磁盘中 */void flushAppendOnlyFile(int force) { ssize_t nwritten; int sync_in_progress = 0; mstime_t latency; // 如果缓冲区没有数据,则无需写入同步 if (sdslen(server.aof_buf) == 0) return; // 如果当前的同步模式为AOF_FSYNC_EVERYSEC;即每秒同步一次 if (server.aof_fsync == AOF_FSYNC_EVERYSEC) // 获取当前正在等待的同步任务个数 sync_in_progress = bioPendingJobsOfType(REDIS_BIO_AOF_FSYNC) != 0; // 如果当前的同步模式为AOF_FSYNC_EVERYSEC;且是非强制的 if (server.aof_fsync == AOF_FSYNC_EVERYSEC && !force) { /* With this append fsync policy we do background fsyncing. * If the fsync is still in progress we can try to delay * the write for a couple of seconds. */ if (sync_in_progress) { // 如果本身aof_flush_postponed_start还是0,表明上次fsync还没有完成,所以我们这次延迟一下本次write,fsync if (server.aof_flush_postponed_start == 0) { /* No previous write postponinig, remember that we are * postponing the flush and return. */ server.aof_flush_postponed_start = server.unixtime; return; // 如果上次的fsync延迟时间小于2,继续延迟 } else if (server.unixtime - server.aof_flush_postponed_start < 2) { /* We were already waiting for fsync to finish, but for less * than two seconds this is still ok. Postpone again. */ return; } /* Otherwise fall trough, and go write since we can't wait * over two seconds. */ // 以上两种情况都不满足,则执行真正的同步,延迟的同步计数器加一 server.aof_delayed_fsync++; redisLog(REDIS_NOTICE,"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis."); } } /* We want to perform a single write. This should be guaranteed atomic * at least if the filesystem we are writing is a real physical one. * While this will save us against the server being killed I don't think * there is much to do about the whole server stopping for power problems * or alike */ // 在进行写入操作的时候,还监听了延迟,Redis对中很多简单的资源进行延迟采样监听,比如I/O磁盘操作,执行一些指令, latencyStartMonitor(latency); nwritten = write(server.aof_fd,server.aof_buf,sdslen(server.aof_buf)); latencyEndMonitor(latency); /* We want to capture different events for delayed writes: * when the delay happens with a pending fsync, or with a saving child * active, and when the above two conditions are missing. * We also use an additional event name to save all samples which is * useful for graphing / monitoring purposes. */ // 记录相关的延迟采样 if (sync_in_progress) { latencyAddSampleIfNeeded("aof-write-pending-fsync",latency); } else if (server.aof_child_pid != -1 || server.rdb_child_pid != -1) { latencyAddSampleIfNeeded("aof-write-active-child",latency); } else { latencyAddSampleIfNeeded("aof-write-alone",latency); } latencyAddSampleIfNeeded("aof-write",latency); /* We performed the write so reset the postponed flush sentinel to zero. */ server.aof_flush_postponed_start = 0; // 如果写入的数据总长度不是所期望的,也就是出现了错误,以下操作就是记录错误日志和恢复处理 if (nwritten != (signed)sdslen(server.aof_buf)) { static time_t last_write_error_log = 0; int can_log = 0; /* Limit logging rate to 1 line per AOF_WRITE_LOG_ERROR_RATE seconds. */ if ((server.unixtime - last_write_error_log) > AOF_WRITE_LOG_ERROR_RATE) { can_log = 1; last_write_error_log = server.unixtime; } /* Lof the AOF write error and record the error code. */ if (nwritten == -1) { if (can_log) { redisLog(REDIS_WARNING,"Error writing to the AOF file: %s", strerror(errno)); server.aof_last_write_errno = errno; } } else { if (can_log) { redisLog(REDIS_WARNING,"Short write while writing to " "the AOF file: (nwritten=%lld, " "expected=%lld)", (long long)nwritten, (long long)sdslen(server.aof_buf)); } if (ftruncate(server.aof_fd, server.aof_current_size) == -1) { if (can_log) { redisLog(REDIS_WARNING, "Could not remove short write " "from the append-only file. Redis may refuse " "to load the AOF the next time it starts. " "ftruncate: %s", strerror(errno)); } } else { /* If the ftrunacate() succeeded we can set nwritten to * -1 since there is no longer partial data into the AOF. */ nwritten = -1; } server.aof_last_write_errno = ENOSPC; } /* Handle the AOF write error. */ // 如果是AOF_FSYNC_ALWAYS模式下出错,直接退出进程,交给用户处理 if (server.aof_fsync == AOF_FSYNC_ALWAYS) { /* We can't recover when the fsync policy is ALWAYS since the * reply for the client is already in the output buffers, and we * have the contract with the user that on acknowledged write data * is synched on disk. */ redisLog(REDIS_WARNING,"Can't recover from AOF write error when the AOF fsync policy is 'always'. Exiting..."); exit(1); } else { /* Recover from failed write leaving data into the buffer. However * set an error to stop accepting writes as long as the error * condition is not cleared. */ server.aof_last_write_status = REDIS_ERR; /* Trim the sds buffer if there was a partial write, and there * was no way to undo it with ftruncate(2). */ // 更新aof_buf,使用原来aof_buf从nwritten开始的字节内容替换aof_buf if (nwritten > 0) { server.aof_current_size += nwritten; sdsrange(server.aof_buf,nwritten,-1); } return; /* We'll try again on the next call... */ } } else { // 没有出错,记录日志,状态 /* Successful write(2). If AOF was in error state, restore the * OK state and log the event. */ if (server.aof_last_write_status == REDIS_ERR) { redisLog(REDIS_WARNING, "AOF write error looks solved, Redis can write again."); server.aof_last_write_status = REDIS_OK; } } // 表示缓冲区已写入,无可写字节 server.aof_current_size += nwritten; /* Re-use AOF buffer when it is small enough. The maximum comes from the * arena size of 4k minus some overhead (but is otherwise arbitrary). */ if ((sdslen(server.aof_buf)+sdsavail(server.aof_buf)) < 4000) { sdsclear(server.aof_buf); } else { sdsfree(server.aof_buf); server.aof_buf = sdsempty(); } /* Don't fsync if no-appendfsync-on-rewrite is set to yes and there are * children doing I/O in the background. */ if (server.aof_no_fsync_on_rewrite && (server.aof_child_pid != -1 || server.rdb_child_pid != -1)) return; /* Perform the fsync if needed. */ if (server.aof_fsync == AOF_FSYNC_ALWAYS) { /* aof_fsync is defined as fdatasync() for Linux in order to avoid * flushing metadata. */ latencyStartMonitor(latency); // 如果采用AOF_FSYNC_ALWAYS的配置,则调用系统的fdatasync函数进行同步 aof_fsync(server.aof_fd); /* Let's try to get this data on the disk */ latencyEndMonitor(latency); latencyAddSampleIfNeeded("aof-fsync-always",latency); server.aof_last_fsync = server.unixtime; } else if ((server.aof_fsync == AOF_FSYNC_EVERYSEC && server.unixtime > server.aof_last_fsync)) { // 如果采用 AOF_FSYNC_EVERYSEC配置,则使用后台线程每隔1秒进行同步操作,最终调用的还是Linux系统的fsync if (!sync_in_progress) aof_background_fsync(server.aof_fd); server.aof_last_fsync = server.unixtime; }}
三、AOF文件持久化的过程梳理
经过以上分析,我们大致对AOF文件的持久化操作有了一定的了解,下面再来总结和梳理一下整个过程:
首先,redis的main函数最后会调用一个事件循环处理函数:
/* ae事件执行主程序 */void aeMain(aeEventLoop *eventLoop) { eventLoop->stop = 0; //如果eventLoop中的stop标志位不为1,就循环处理 while (!eventLoop->stop) { //每次eventLoop事件执行完后又重新开始执行时调用 if (eventLoop->beforesleep != NULL) eventLoop->beforesleep(eventLoop); //while循环处理所有的evetLoop的事件 aeProcessEvents(eventLoop, AE_ALL_EVENTS); }}
aeProcessEvents 之后的调用过程:
aeProcessEvents —>readQueryFromClient—>processInputBuffer—>processCommand—>call—>propagate—>feedAppendOnlyFile(cmd,dbid,argv,argc)将数据写入缓冲区aof_buf
aeProcessEvents 处理事件结束后,会执行beforesleep函数
beforesleep函数的调用过程:
void beforeSleep(struct aeEventLoop *eventLoop) { ....... /* Write the AOF buffer on disk */ flushAppendOnlyFile(0);}
flushAppendOnlyFile函数就是我们在第二节中分析的aof_buf的写入和同步实现了
到此AOF的持久化机制介绍到这里
问题:如果一直这样AOF下去,把所有客户端命令都重放到AOF文件内,势必导致AOF文件非常大,不断增大,而且可能会有很多重复的无用命令,这种情况我们如何解决呢?下篇博文中继续学习关于AOF文件的重写机制,redis使用AOF文件重写机制来解决这种限制
- Redis之AOF文件持久化
- redis之aof持久化
- Redis持久化之AOF
- redis持久化之AOF
- Redis 持久化之AOF
- Redis持久化之AOF
- [Redis]Redis持久化之AOF
- redis之AOF持久化机制
- Redis之十一 AOF持久化
- Redis持久化之AOF(三)
- redis 持久化技术 之 aof
- Redis 持久化之AOF(三)
- Redis之aof日志持久化
- Redis的持久化之AOF方式
- Redis的持久化之AOF方式
- Redis的持久化之AOF
- Redis 持久化之RDB和AOF
- Redis AOF持久化
- Java入门测试题,测测你基础知识掌握程度(附答案)
- WDK7600编译出现 error LNK1218
- Java List 排序
- python键盘钢琴
- java 上传文件到服务器之jquery.uploadify
- Redis之AOF文件持久化
- MySQL定时执行脚本(计划任务)命令实例
- websphere更新web.xml
- sharedpreferences的使用方法小结
- spring中scope作用域
- Android调用系统相册的方法
- 抛出异常getOutputStream() has already been called for this response
- 让ListView指定的item获取焦点
- 1、hash法