从文件系统恢复遗失的UNLOGGED table's datafile.

来源:互联网 发布:精通qt4编程 pdf 编辑:程序博客网 时间:2024/04/28 23:45

Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:

  • Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
  • Postgres-XL的项目发起人Mason Sharp
  • pgpool的作者石井达夫(Tatsuo Ishii)
  • PG-Strom的作者海外浩平(Kaigai Kohei)
  • Greenplum研发总监姚延栋
  • 周正中(德哥), PostgreSQL中国用户会创始人之一
  • 汪洋,平安科技数据库技术部经理
  • ……
 
  • 2015年度PG大象会报名地址:http://postgres2015.eventdove.com/
  • PostgreSQL中国社区: http://postgres.cn/
  • PostgreSQL专业1群: 3336901(已满)
  • PostgreSQL专业2群: 100910388
  • PostgreSQL专业3群: 150657323



  • 注意PostgreSQL的unlogged table是不记录xlog的,所以在备库上没有unlogged table的数据记录。
    同时,数据库在进入恢复状态时,为了保证数据的一致性,postgresql会自动清除unlogged table的数据文件。
    那么问题来了,万一你不小心误创建了unlogged table,你可能一开始会没有感知,但是一旦发生以下情况,你会发现数据不见了。
    1. 数据库crash掉之后,重启。
    2. 主备切换,备库变成主库,主库变成备库。
    以上两种情况,都是数据库在启动并进入recovery状态后,原来主库下的unlogged table的datafile都会被清空。
    代码见:
    src/backend/access/transam/xlog.c

    /*
     * This must be called ONCE during postmaster or standalone-backend startup
     */
    void
    StartupXLOG(void)
    {
    ......

            /* REDO */
            if (InRecovery)
            {
    ......
                    /*
                     * We're in recovery, so unlogged relations may be trashed and must be
                     * reset.  This should be done BEFORE allowing Hot Standby
                     * connections, so that read-only backends don't try to read whatever
                     * garbage is left over from before.
                     */
                    ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
    ......
            /*
             * Reset unlogged relations to the contents of their INIT fork. This is
             * done AFTER recovery is complete so as to include any unlogged relations
             * created during recovery, but BEFORE recovery is marked as having
             * completed successfully. Otherwise we'd not retry if any of the post
             * end-of-recovery steps fail.
             */
            if (InRecovery)
                    ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
    ......


    backend/storage/file/reinit.c

    /*
     * Reset unlogged relations from before the last restart.
     *
     * If op includes UNLOGGED_RELATION_CLEANUP, we remove all forks of any
     * relation with an "init" fork, except for the "init" fork itself.
     *
     * If op includes UNLOGGED_RELATION_INIT, we copy the "init" fork to the main
     * fork.
     */
    void
    ResetUnloggedRelations(int op)
    {
            char            temp_path[MAXPGPATH];
            DIR                *spc_dir;
            struct dirent *spc_de;
            MemoryContext tmpctx,
                                    oldctx;

            /* Log it. */
            elog(DEBUG1, "resetting unlogged relations: cleanup %d init %d",
                     (op & UNLOGGED_RELATION_CLEANUP) != 0,
                     (op & UNLOGGED_RELATION_INIT) != 0);

            /*
             * Just to be sure we don't leak any memory, let's create a temporary
             * memory context for this operation.
             */
            tmpctx = AllocSetContextCreate(CurrentMemoryContext,
                                                                       "ResetUnloggedRelations",
                                                                       ALLOCSET_DEFAULT_MINSIZE,
                                                                       ALLOCSET_DEFAULT_INITSIZE,
                                                                       ALLOCSET_DEFAULT_MAXSIZE);
            oldctx = MemoryContextSwitchTo(tmpctx);

            /*
             * First process unlogged files in pg_default ($PGDATA/base)
             */
            ResetUnloggedRelationsInTablespaceDir("base", op);

            /*
             * Cycle through directories for all non-default tablespaces.
             */
            spc_dir = AllocateDir("pg_tblspc");

            while ((spc_de = ReadDir(spc_dir, "pg_tblspc")) != NULL)
            {
                    if (strcmp(spc_de->d_name, ".") == 0 ||
                            strcmp(spc_de->d_name, "..") == 0)
                            continue;

                    snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s",
                                     spc_de->d_name, TABLESPACE_VERSION_DIRECTORY);
                    ResetUnloggedRelationsInTablespaceDir(temp_path, op);
            }
            FreeDir(spc_dir);

            /*
             * Restore memory context.
             */
            MemoryContextSwitchTo(oldctx);
            MemoryContextDelete(tmpctx);
    }

    src/include/common/relpath.h

    /*
     * Stuff for fork names.
     *
     * The physical storage of a relation consists of one or more forks.
     * The main fork is always created, but in addition to that there can be
     * additional forks for storing various metadata. ForkNumber is used when
     * we need to refer to a specific fork in a relation.
     */
    typedef enum ForkNumber
    {
            InvalidForkNumber = -1,
            MAIN_FORKNUM = 0,
            FSM_FORKNUM,
            VISIBILITYMAP_FORKNUM,
            INIT_FORKNUM

            /*
             * NOTE: if you add a new fork, change MAX_FORKNUM and possibly
             * FORKNAMECHARS below, and update the forkNames array in
             * src/common/relpath.c
             */
    } ForkNumber;


    那么问题来了,如果真的这样了,有办法恢复吗?
    1. 首先,如果你在原来的主库上有基础备份,你可以从基础备份恢复。为什么需要主库的备份呢,因为备库上没有unlogged table的数据文件内容,所以在备库备份是备不到unlogged table的datafile的。
    但是这种恢复方法也务必要小心,你需要在启动数据库前,先把_init的文件都删掉,这样启动数据库时数据文件就不会被清除。
    2. 从主库的逻辑备份中恢复。
    3. 从审计日志中回放SQL恢复。
    4. 如果你没有主库的基础备份,那么可有从文件系统中去恢复删掉的数据文件。例如ext4文件系统的恢复方式如下:
    http://blog.163.com/digoal@126/blog/static/16387704020142124032866/
    例子:
    创建一个unlogged table,并记录它的filenode

    postgres=# create unlogged table utbl1(id int);
    CREATE TABLE
    postgres=# insert into utbl1 select generate_series(1,1000);
    INSERT 0 1000
    postgres=# select pg_relation_filepath('utbl1'::regclass);
     pg_relation_filepath 
    ----------------------
     base/151898/33822
    (1 row)

    查看到PG对unlogged table做了_init的后缀标记
    http://www.postgresql.org/docs/9.4/static/storage-init.html

    postgres@digoal-> cd $PGDATA
    postgres@digoal-> ll base/151898/33822*
    -rw------- 1 postgres postgres 40K Sep 26 11:39 base/151898/33822
    -rw------- 1 postgres postgres 24K Sep 26 11:39 base/151898/33822_fsm
    -rw------- 1 postgres postgres   0 Sep 26 11:38 base/151898/33822_init

    现在,我们进入恢复模式

    postgres@digoal-> mv recovery.done recovery.conf
    postgres@digoal-> pg_ctl start

    数据库启动后,其实已经将unlogged table的数据文件清理掉了,见前面的源码。

    postgres=# select count(*) from utbl1 ;
    ERROR:  cannot access temporary or unlogged relations during recovery

    现在再起来,unlogged table就没有数据了。

    postgres@digoal-> mv recovery.conf recovery.done
    postgres@digoal-> pg_ctl start
    postgres=# select count(*) from utbl1 ;
     count 
    -------
         0
    (1 row)


    恢复过程,先停库,然后umount数据文件所在的文件系统。

    pg_ctl stop -m fast
    root@digoal-> umount /data01

    假设我已经安装了extundelete
    http://sourceforge.net/projects/extundelete/files/extundelete/
    http://blog.163.com/digoal@126/blog/static/16387704020142124032866/

    root@digoal-> cd /opt/extundelete/
    root@digoal-> ll
    total 4.0K
    drwxr-xr-x 2 root root 4.0K Sep 26 13:43 bin
    root@digoal-> cd bin/
    root@digoal-> ll
    total 1.2M
    -rwxr-xr-x 1 root root 1.2M Sep 26 13:43 extundelete

    查看数据文件所在的文件系统的块设备的inode信息。

    root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01
    ./extundelete: unrecognized option '--ls'
    NOTICE: Extended attributes are not restored.
    Loading filesystem metadata ... 80 groups loaded.
    Group: 0
    Contents of inode 2:
    0000 | ed 41 00 00 00 10 00 00 6d 08 fa 55 bc 08 fa 55 | .A......m..U...U
    0010 | bc 08 fa 55 00 00 00 00 00 00 05 00 08 00 00 00 | ...U............
    0020 | 00 00 00 00 02 00 00 00 a1 22 00 00 00 00 00 00 | ........."......
    0030 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
    0040 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
    0050 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
    0060 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
    0070 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
    0080 | 1c 00 00 00 88 90 cc 84 88 90 cc 84 00 00 00 00 | ................
    0090 | 6d 08 fa 55 00 00 00 00 00 00 00 00 00 00 00 00 | m..U............
    00a0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
    00b0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
    00c0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
    00d0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
    00e0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
    00f0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................

    Inode is Allocated
    File mode: 16877
    Low 16 bits of Owner Uid: 0
    Size in bytes: 4096
    Access time: 1442449517
    Creation time: 1442449596
    Modification time: 1442449596
    Deletion Time: 0
    Low 16 bits of Group Id: 0
    Links count: 5
    Blocks count: 8
    File flags: 0
    File version (for NFS): 0
    File ACL: 0
    Directory ACL: 0
    Fragment address: 0
    Direct blocks: 8865, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
    Indirect block: 0
    Double indirect block: 0
    Triple indirect block: 0

    File name                                       | Inode number | Deleted status
    .                                                 2
    ..                                                2
    lost+found                                        11
    pg_root_1921                                      131073
    pg_root_1922                                      393217

    我们的数据文件在pg_root_1921目录下,对应的inode=131073,根据这个inode继续查询下一级目录的inode

    root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131073
    找到了baseinode
    base                                              131077

    继续找,找到了unlogged table所在的database的inode

    root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131077
    151898        

                                        131078
    继续找,找对应的_init文件,找到了它的前缀,根据前缀过滤

    root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131078|grep _init
    33822_init                                        131152

    找到了被删除的文件

    root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131078|grep 33822
    ./extundelete: unrecognized option '--ls'
    33822                                             131116         Deleted
    33822_fsm                                         131147         Deleted
    33822_init                                        131152

    使用inode恢复,恢复删除的datafile

    root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --restore-inode 131116

    恢复后放在这里

    root@digoal->  ll RECOVERED_FILES/ 
    total 40K
    -rw-r--r-- 1 root root 40K Sep 26 13:50 file.131116

    将文件拷贝到原来的位置

    root@digoal-> mount /data01
    root@digoal-> cp file.131116 /data01/pg_root_1921/base/151898/33822
    root@digoal-> chown -R postgres:postgres /data01/
    root@digoal-> su - postgres

    启动数据库前,切记,删除_init后缀的文件,否则数据文件可能又会被清理掉。

    cd $PGDATA
    postgres@digoal-> rm -f base/151898/33822_init 

    启动数据库,数据已经回来了。

    postgres@digoal-> pg_ctl start
    postgres=# select count(*) from utbl1 ;
     count 
    -------
      1000
    (1 row)


    大家千万要切记,不要滥用unlogged table,除非你可以为你的行为负责。
    hash index也是这样,因为不写XLOG,所以如果你使用流复制来实施HA的话,在切换到备库后,如果走hash index scan,你会发现数据凭空"消失",实际上是因为hash index的变更没有复制过去的原因。所以就不要使用hash index了吧。

    [参考]
    1. http://sourceforge.net/projects/extundelete/files/extundelete/
    2. http://blog.163.com/digoal@126/blog/static/16387704020142124032866/
    0 0
    原创粉丝点击