don't shutdown before pg_stop_backup()
来源:互联网 发布:银行数据管理制度 编辑:程序博客网 时间:2024/06/07 04:08
Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:
- Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
- Postgres-XL的项目发起人Mason Sharp
- pgpool的作者石井达夫(Tatsuo Ishii)
- PG-Strom的作者海外浩平(Kaigai Kohei)
- Greenplum研发总监姚延栋
- 周正中(德哥), PostgreSQL中国用户会创始人之一
- 汪洋,平安科技数据库技术部经理
- ……
在创建备份或STANDBY时,我们通常可以有两种方式一种是通过pg_basebackup,另一种是使用pg_start_backup然后COPY文件的方式。
在使用第二种方式时,正确的流程应该是:
pg_start_backup
COPY
pg_stop_backup
但是我们试想一下,如果在COPY完后,调用pg_stop_backup前,主库关闭了,然后又起来了。
这个备份还有效吗?
先来试一试:
1. 执行pg_start_backup
do_pg_start_backup@src/backend/access/transam/xlog.c被调用。
postgres@digoal-> psqlpsql (9.4.4)Type "help" for help.postgres=# select pg_start_backup('a');
2. 拷贝文件,并使用这个拷贝创建备库
postgres@digoal-> rm -f postmaster.pidpostgres@digoal-> pg_ctl start -D .server startingpostgres@digoal-> LOG: 00000: redirecting log output to logging collector processHINT: Future log output will appear in directory "/data03/pg_log_1922".LOCATION: SysLogger_Start, syslogger.c:645
postgres@digoal-> top -c -u postgres
top - 15:36:48 up 7:19, 4 users, load average: 0.00, 0.00, 0.00Tasks: 159 total, 1 running, 158 sleeping, 0 stopped, 0 zombieCpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%stMem: 1914436k total, 961196k used, 953240k free, 104048k buffersSwap: 1048572k total, 0k used, 1048572k free, 666808k cachedPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND4082 postgres 20 0 15040 1272 936 R 33.2 0.1 0:00.25 top -c -u postgres1908 postgres 20 0 105m 1948 1452 S 0.0 0.1 0:00.64 -bash3850 postgres 20 0 105m 1960 1448 S 0.0 0.1 0:00.28 -bash3987 postgres 20 0 488m 34m 32m S 0.0 1.8 0:00.21 /opt/pgsql9.4.4/bin/postgres3989 postgres 20 0 186m 3112 708 S 0.0 0.2 0:00.00 postgres: logger process3991 postgres 20 0 488m 5928 3492 S 0.0 0.3 0:00.00 postgres: checkpointer process3992 postgres 20 0 488m 5544 3112 S 0.0 0.3 0:00.05 postgres: writer process3993 postgres 20 0 488m 19m 16m S 0.0 1.0 0:00.31 postgres: wal writer process3994 postgres 20 0 188m 3380 800 S 0.0 0.2 0:00.00 postgres: stats collector process4069 postgres 20 0 488m 34m 32m S 0.0 1.8 0:00.11 /opt/pgsql9.4.4/bin/postgres -D .4071 postgres 20 0 186m 3116 708 S 0.0 0.2 0:00.00 postgres: logger process4072 postgres 20 0 488m 4040 1572 S 0.0 0.2 0:00.01 postgres: startup process recovering 0000000600000029000000334076 postgres 20 0 488m 3344 912 S 0.0 0.2 0:00.00 postgres: checkpointer process4077 postgres 20 0 488m 3432 1000 S 0.0 0.2 0:00.00 postgres: writer process 4078 postgres 20 0 495m 5380 2840 S 0.0 0.3 0:00.00 postgres: wal receiver process4080 postgres 20 0 489m 4828 2216 S 0.0 0.3 0:00.00 postgres: wal sender process replica 192.168.150.128(26168) streaming 29/33000100
3. 关闭主库,注意这个时候还没有在主库执行 pg_stop_backup
4. 备库CRASH掉了。并产生了一个CORE文件。
日志如下:
2015-09-10 15:36:56.065 CST,,,4072,,55f1330a.fe8,5,,2015-09-10 15:36:42 CST,1/0,0,PANIC,XX000,"online backup was canceled, recovery cannot continue",,,,,"xlog redo checkpoint: redo 29/33000100; tli 6; prev tli 6; fpw true; xid 0/639272964; oid 151900; multi 788; offset 1711; oldest xid 600048491 in DB 1; oldest multi 631 in DB 1; oldest running xid 0; shutdown",,,"xlog_redo, xlog.c:9310",""
CORE文件
[root@digoal pg_root_1922]# lltotal 10540-rw------- 1 postgres postgres 193 Sep 10 15:36 backup_label.olddrwx------ 6 postgres postgres 4096 Sep 10 15:36 base-rw------- 1 postgres postgres 318500864 Sep 10 15:36 core.4072drwx------ 2 postgres postgres 4096 Sep 10 15:36 globaldrwx------ 2 postgres postgres 4096 Sep 10 15:36 pg_clog
分析这个CORE文件:
# gdb -c ./core.4072 /opt/pgsql/bin/postgres
截取最后一部分信息如下:
Loaded symbols for /lib64/libnss_files.so.2Reading symbols from /lib/modules/2.6.32-504.el6.x86_64/vdso/vdso.so...Reading symbols from /usr/lib/debug/lib/modules/2.6.32-504.el6.x86_64/vdso/vdso.so.debug...done.done.Loaded symbols for /lib/modules/2.6.32-504.el6.x86_64/vdso/vdso.soCore was generated by `postgres: startup process recovering 00000006000000'.Program terminated with signal 6, Aborted.#0 0x00000038b9032625 in raise () from /lib64/libc.so.6Missing separate debuginfos, use: debuginfo-install audit-libs-2.3.7-5.el6.x86_64 cyrus-sasl-lib-2.1.23-15.el6.x86_64 glibc-2.12-1.149.el6_6.9.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-33.el6.x86_64 libcom_err-1.41.12-21.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64 libxml2-2.7.6-17.el6_6.1.x86_64 nspr-4.10.6-1.el6_5.x86_64 nss-3.16.1-14.el6.x86_64 nss-softokn-freebl-3.14.3-17.el6.x86_64 nss-util-3.16.1-3.el6.x86_64 openldap-2.4.39-8.el6.x86_64 openssl-1.0.1e-42.el6.x86_64 pam-1.1.1-20.el6.x86_64 zlib-1.2.3-29.el6.x86_64(gdb) bt#0 0x00000038b9032625 in raise () from /lib64/libc.so.6#1 0x00000038b9033e05 in abort () from /lib64/libc.so.6#2 0x0000000000b64a3a in errfinish (dummy=0) at elog.c:572#3 0x0000000000547058 in xlog_redo (lsn=176848634272, record=0x2311328) at xlog.c:9309#4 0x0000000000540909 in StartupXLOG () at xlog.c:6920#5 0x00000000008b7455 in StartupProcessMain () at startup.c:224#6 0x000000000055a61a in AuxiliaryProcessMain (argc=2, argv=0x7fffa1f9d130) at bootstrap.c:422#7 0x00000000008b57e3 in StartChildProcess (type=StartupProcess) at postmaster.c:5155#8 0x00000000008abc59 in PostmasterMain (argc=3, argv=0x22cb280) at postmaster.c:1237#9 0x00000000007a8f01 in main (argc=3, argv=0x22cb280) at main.c:228
可以看到PG的错误日志是由xlog_redo 引起,相关的代码:
xlog_redo是在数据恢复时被调用,XLOG_CHECKPOINT_SHUTDOWN是一笔数据库关闭的xlog.
else if (info == XLOG_CHECKPOINT_SHUTDOWN){CheckPoint checkPoint;memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));/* In a SHUTDOWN checkpoint, believe the counters exactly */LWLockAcquire(XidGenLock, LW_EXCLUSIVE);ShmemVariableCache->nextXid = checkPoint.nextXid;LWLockRelease(XidGenLock);LWLockAcquire(OidGenLock, LW_EXCLUSIVE);ShmemVariableCache->nextOid = checkPoint.nextOid;ShmemVariableCache->oidCount = 0;LWLockRelease(OidGenLock);MultiXactSetNextMXact(checkPoint.nextMulti,checkPoint.nextMultiOffset);SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB);MultiXactSetSafeTruncate(checkPoint.oldestMulti);/** If we see a shutdown checkpoint while waiting for an end-of-backup* record, the backup was canceled and the end-of-backup record will* never arrive.*/ 此时ControlFile->backupStartPoint <>0 并且ControlFile->backupEndPoint = 0 说明备份未结束。if (ArchiveRecoveryRequested &&!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) &&XLogRecPtrIsInvalid(ControlFile->backupEndPoint))ereport(PANIC,(errmsg("online backup was canceled, recovery cannot continue")));
我们可以看到,正常情况下是这样的:
postgres@digoal-> pg_controldata |grep -i backup
Backup start location: 0/0Backup end location: 0/0End-of-backup record required: no
但是当前备库的情况是这样的:
postgres@digoal-> pg_controldata /data01/pg_root_1922|grep -i back
Minimum recovery ending location: 29/2D000028Min recovery ending loc's timeline: 6Backup start location: 29/2D000028Backup end location: 0/0End-of-backup record required: no
如果我们调用了pg_stop_backend,那么会在XLOG写一笔XLOG_BACKUP_END的RECORD。
do_pg_stop_backup@src/backend/access/transam/xlog.c
stoppoint = XLogInsert(RM_XLOG_ID, XLOG_BACKUP_END, &rdata);....../** Force a switch to a new xlog segment file, so that the backup is valid* as soon as archiver moves out the current segment file.*/RequestXLogSwitch();
然后,在恢复时,如果遇到这个RECORD,会把 ControlFile->backupStartPoint 改成0,也就是我们看到的正常的数据。
/** XLOG resource manager's routines** Definitions of info values are in include/catalog/pg_control.h, though* not all record types are related to control file updates.*/voidxlog_redo(XLogRecPtr lsn, XLogRecord *record){......else if (info == XLOG_BACKUP_END){......if (ControlFile->backupStartPoint == startpoint){......ControlFile->backupStartPoint = InvalidXLogRecPtr;
那么ControlFile->backupStartPoint == startpoint又是什么情况下会满足呢?
其实是在启动STARTUP进程时,判断有没有一个LABEL文件,有的话,就会将ControlFile->backupStartPoint改为checkPoint.redo。
/** This must be called ONCE during postmaster or standalone-backend startup*/voidStartupXLOG(void){....../* REDO */if (InRecovery){......./** Set backupStartPoint if we're starting recovery from a base backup.** Also set backupEndPoint and use minRecoveryPoint as the backup end* location if we're starting recovery from a base backup which was* taken from a standby. In this case, the database system status in* pg_control must indicate that the database was already in* recovery. Usually that will be DB_IN_ARCHIVE_RECOVERY but also can* be DB_SHUTDOWNED_IN_RECOVERY if recovery previously was interrupted* before reaching this point; e.g. because restore_command or* primary_conninfo were faulty.** Any other state indicates that the backup somehow became corrupted* and we can't sensibly continue with recovery.*/if (haveBackupLabel){ControlFile->backupStartPoint = checkPoint.redo;ControlFile->backupEndRequired = backupEndRequired;
[参考]
1. src/backend/access/transam/xlog.c
0 0
- don't shutdown before pg_stop_backup()
- 120. Don't put the cart before the horse. 不要本末倒置
- 10 Reasons You Don't Need Hadoop For Your Data Analysis - Alternatives Must Try Before Using It
- Don't Cry
- Don't use XP!
- Don't Use Select *
- Don't you worry
- 三Don't原则
- Don't Wanna Try
- don't cry
- Don't Cry
- don't cry
- Don't say goodbye
- don't say goodbye
- don't lost
- Don't use extjs
- Don't overskill
- You don't say!
- 为何nested loop要求小表驱动,hash join又为何要求小表hashed?(转)
- android开源项目
- 可乐卤蛋的做法
- VMware提示:已将该虚拟机配置为使用 64 位客户机操作系统。但是,无法执行 64 位操作。
- 教你如何迅速秒杀掉:99%的海量数据处理面试题
- don't shutdown before pg_stop_backup()
- 关于chrome插件的开发
- android apk签名和包名
- linux中memset的正确用法
- 你不知道的JavaScript--Item1 严格模式
- PyCharm 4.5 外观设置教程
- 绘制模式之硬加速和软加速
- londiste3 copy table' snapshot & PostgreSQL logical replication's snapshot 不同之处
- latex 插入代码、代码框、代码配色