ceph (luminous 版) data disk 故障测试

来源:互联网 发布:js 表格动态增加行 编辑:程序博客网 时间:2024/05/22 01:32

目的

模拟 ceph (luminous 版) data disk 故障修复上述问题

环境

参考手动部署 ceph 环境说明 (luminous 版)

参考当前 ceph 环境

ceph -s

  cluster:    id:     c45b752d-5d4d-4d3a-a3b2-04e73eff4ccd    health: HEALTH_OK  services:    mon: 3 daemons, quorum hh-ceph-128040,hh-ceph-128214,hh-ceph-128215    mgr: openstack(active)    osd: 36 osds: 36 up, 36 in  data:    pools:   1 pools, 2048 pgs    objects: 28024 objects, 109 GB    usage:   331 GB used, 196 TB / 196 TB avail    pgs:     2048 active+clean

osd tree (取部分)

[root@hh-ceph-128214 ceph]# ceph osd treeID  CLASS WEIGHT    TYPE NAME                   STATUS REWEIGHT PRI-AFF -1       216.00000 root default-10        72.00000     rack racka07 -3        72.00000         host hh-ceph-128214 12   hdd   6.00000             osd.12              up  1.00000 1.00000 13   hdd   6.00000             osd.13              up  1.00000 1.00000 14   hdd   6.00000             osd.14              up  1.00000 1.00000 15   hdd   6.00000             osd.15              up  1.00000 1.00000 16   hdd   6.00000             osd.16              up  1.00000 1.00000 17   hdd   6.00000             osd.17              up  1.00000 1.00000 18   hdd   6.00000             osd.18              up  1.00000 1.00000 19   hdd   6.00000             osd.19              up  1.00000 1.00000 20   hdd   6.00000             osd.20              up  1.00000 1.00000 21   hdd   6.00000             osd.21              up  1.00000 1.00000 22   hdd   6.00000             osd.22              up  1.00000 1.00000 23   hdd   6.00000             osd.23              up  1.00000 1.00000 -9        72.00000     rack racka12 -2        72.00000         host hh-ceph-128040  0   hdd   6.00000             osd.0               up  1.00000 0.50000  1   hdd   6.00000             osd.1               up  1.00000 1.00000  2   hdd   6.00000             osd.2               up  1.00000 1.00000  3   hdd   6.00000             osd.3               up  1.00000 1.00000

故障模拟

[root@hh-ceph-128214 ceph]# df -h | grep ceph-14/dev/sdc1       5.5T  8.8G  5.5T    1% /var/lib/ceph/osd/ceph-14/dev/sdn3       4.7G  2.1G  2.7G   44% /var/lib/ceph/journal/ceph-14[root@hh-ceph-128214 ceph]# rm -rf  /var/lib/ceph/osd/ceph-14/*[root@hh-ceph-128214 ceph]# ls /var/lib/ceph/osd/ceph-14/

查询当前状态

  cluster:    id:     c45b752d-5d4d-4d3a-a3b2-04e73eff4ccd    health: HEALTH_WARN            1 osds down            Degraded data redundancy: 3246/121608 objects degraded (2.669%), 124 pgs unclean, 155 pgs degraded  services:    mon: 3 daemons, quorum hh-ceph-128040,hh-ceph-128214,hh-ceph-128215    mgr: openstack(active)    osd: 36 osds: 35 up, 36 in  data:    pools:   1 pools, 2048 pgs    objects: 40536 objects, 157 GB    usage:   493 GB used, 195 TB / 196 TB avail    pgs:     3246/121608 objects degraded (2.669%)             1893 active+clean             155  active+undersized+degraded  io:    client:   132 kB/s rd, 177 MB/s wr, 165 op/s rd, 175 op/s wr

参考 osd tree

[root@hh-ceph-128214 ceph]# ceph osd treeID  CLASS WEIGHT    TYPE NAME                   STATUS REWEIGHT PRI-AFF -1       216.00000 root default-10        72.00000     rack racka07 -3        72.00000         host hh-ceph-128214 12   hdd   6.00000             osd.12              up  1.00000 1.00000 13   hdd   6.00000             osd.13              up  1.00000 1.00000 14   hdd   6.00000             osd.14            down  1.00000 1.00000 15   hdd   6.00000             osd.15              up  1.00000 1.00000 16   hdd   6.00000             osd.16              up  1.00000 1.00000 17   hdd   6.00000             osd.17              up  1.00000 1.00000 18   hdd   6.00000             osd.18              up  1.00000 1.00000 19   hdd   6.00000             osd.19              up  1.00000 1.00000 20   hdd   6.00000             osd.20              up  1.00000 1.00000 21   hdd   6.00000             osd.21              up  1.00000 1.00000 22   hdd   6.00000             osd.22              up  1.00000 1.00000 23   hdd   6.00000             osd.23              up  1.00000 1.00000 -9        72.00000     rack racka12 -2        72.00000         host hh-ceph-128040  0   hdd   6.00000             osd.0               up  1.00000 0.50000  1   hdd   6.00000             osd.1               up  1.00000 1.00000

参考错误日志

orting failure:12017-11-24 16:09:24.767761 7fdd215c1700  0 log_channel(cluster) log [DBG] : osd.14 10.199.128.214:6804/11943 reported immediately failed by osd.10 10.199.128.40:6820/126172017-11-24 16:09:24.996514 7fdd215c1700  1 mon.hh-ceph-128040@0(leader).osd e328 prepare_failure osd.14 10.199.128.214:6804/11943 from osd.6 10.199.128.40:6812/12317 is reporting failure:12017-11-24 16:09:24.996545 7fdd215c1700  0 log_channel(cluster) log [DBG] : osd.14 10.199.128.214:6804/11943 reported immediately failed by osd.6 10.199.128.40:6812/123172017-11-24 16:09:25.083523 7fdd23dc6700  0 log_channel(cluster) log [WRN] : Health check failed: 1 osds down (OSD_DOWN)2017-11-24 16:09:25.087241 7fdd1cdb8700  1 mon.hh-ceph-128040@0(leader).log v17642 check_sub sending message to client.94503 10.199.128.40:0/161437639 with 1 entries (version 17642)2017-11-24 16:09:25.093344 7fdd1cdb8700  1 mon.hh-ceph-128040@0(leader).osd e329 e329: 36 total, 35 up, 36 in2017-11-24 16:09:25.093857 7fdd1cdb8700  0 log_channel(cluster) log [DBG] : osdmap e329: 36 total, 35 up, 36 in2017-11-24 16:09:25.094151 7fdd215c1700  0 mon.hh-ceph-128040@0(leader) e1 handle_command mon_command({"prefix": "osd metadata", "id": 30} v 0) v12017-11-24 16:09:25.094192 7fdd215c1700  0 log_channel(audit) log [DBG] : from='client.94503 10.199.128.40:0/161437639' entity='mgr.openstack' cmd=[{"prefix": "osd metadata", "id": 30}]: dispatch

恢复过程

删除 osd.14 auth 授权

[root@hh-ceph-128040 tmp]# ceph auth del osd.14updated

移除 osd.14 osd map

[root@hh-ceph-128214 ~]# ceph osd crush remove osd.14removed item id 14 name 'osd.14' from crush map

移除 OSD.14

[root@hh-ceph-128214 ~]# ceph osd rm osd.14removed osd.14

参考osd tree

Every 2.0s: ceph osd tree                                                                                                                            Sat Nov 25 15:27:41 2017ID  CLASS WEIGHT    TYPE NAME                   STATUS REWEIGHT PRI-AFF -1       210.00000 root default-10        66.00000     rack racka07 -3        66.00000         host hh-ceph-128214 12   hdd   6.00000             osd.12              up  1.00000 1.00000 13   hdd   6.00000             osd.13              up  1.00000 1.00000 15   hdd   6.00000             osd.15              up  1.00000 1.00000 16   hdd   6.00000             osd.16              up  1.00000 1.00000 17   hdd   6.00000             osd.17              up  1.00000 1.00000 18   hdd   6.00000             osd.18              up  1.00000 1.00000 19   hdd   6.00000             osd.19              up  1.00000 1.00000 20   hdd   6.00000             osd.20              up  1.00000 1.00000 21   hdd   6.00000             osd.21              up  1.00000 1.00000 22   hdd   6.00000             osd.22              up  1.00000 1.00000 23   hdd   6.00000             osd.23              up  1.00000 1.00000

删除 journal 文件

[root@hh-ceph-128214 ceph]# rm -rf /var/lib/ceph/journal/ceph-14/journal[root@hh-ceph-128214 /]# umount /dev/sdn3[root@hh-ceph-128214 /]# mkfs -t xfs -f /dev/sdn3meta-data=/dev/sdn3              isize=256    agcount=4, agsize=305152 blks         =                       sectsz=4096  attr=2, projid32bit=1         =                       crc=0        finobt=0data     =                       bsize=4096   blocks=1220608, imaxpct=25         =                       sunit=0      swidth=0 blksnaming   =version 2              bsize=4096   ascii-ci=0 ftype=0log      =internal log           bsize=4096   blocks=2560, version=2         =                       sectsz=4096  sunit=1 blks, lazy-count=1realtime =none                   extsz=4096   blocks=0, rtextents=0[root@hh-ceph-128214 ~]# mount /dev/sdn3 /var/lib/ceph/journal/ceph-14/

恢复分区

[root@hh-ceph-128214 tmp]# umount /dev/sdc1[root@hh-ceph-128214 /]# dd if=/dev/zero of=/dev/sdc bs=1M count=100记录了100+0 的读入记录了100+0 的写出104857600字节(105 MB)已复制,0.59539 秒,176 MB/秒[root@hh-ceph-128214 tmp]# parted -s /dev/sdc  mklabel gpt[root@hh-ceph-128214 tmp]# parted /dev/sdc mkpart primary xfs 1 100%信息: You may need to update /etc/fstab.[root@hh-ceph-128214 tmp]# mkfs.xfs -f -i size=1024  /dev/sdc1meta-data=/dev/sdc1              isize=1024   agcount=6, agsize=268435455 blks         =                       sectsz=4096  attr=2, projid32bit=1         =                       crc=0        finobt=0data     =                       bsize=4096   blocks=1465130240, imaxpct=5         =                       sunit=0      swidth=0 blksnaming   =version 2              bsize=4096   ascii-ci=0 ftype=0log      =internal log           bsize=4096   blocks=521728, version=2         =                       sectsz=4096  sunit=1 blks, lazy-count=1realtime =none                   extsz=4096   blocks=0, rtextents=0[root@hh-ceph-128214 tmp]# mount /dev/sdc1 /var/lib/ceph/osd/ceph-14/

初始化 ceph osd (自动恢复 journal 文件)

[root@hh-ceph-128214 /]# ceph-osd -i 14 --mkfs --mkkey2017-11-24 18:21:42.297329 7fc7dc79bd00 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway2017-11-24 18:21:42.473203 7fc7dc79bd00 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway2017-11-24 18:21:42.473725 7fc7dc79bd00 -1 read_settings error reading settings: (2) No such file or directory2017-11-24 18:21:42.782000 7fc7dc79bd00 -1 created object store /var/lib/ceph/osd/ceph-14 for osd.14 fsid c45b752d-5d4d-4d3a-a3b2-04e73eff4ccd2017-11-24 18:21:42.782044 7fc7dc79bd00 -1 auth: error reading file: /var/lib/ceph/osd/ceph-14/keyring: can't open /var/lib/ceph/osd/ceph-14/keyring: (2) No such file or directory2017-11-24 18:21:42.782202 7fc7dc79bd00 -1 created new key in keyring /var/lib/ceph/osd/ceph-14/keyring

创建 osd

[root@hh-ceph-128214 ~]# ceph osd create14

恢复 auth 认证

[root@hh-ceph-128214 tmp]# ceph auth add osd.14 osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-14/keyringadded key for osd.14

恢复文件权限

[root@hh-ceph-128214 /]# ls -l /var/lib/ceph/journal/ceph-14/  /var/lib/ceph/osd/ceph-14//var/lib/ceph/journal/ceph-14/:总用量 2097152-rw-r--r-- 1 root root 2147483648 1124 18:21 journal/var/lib/ceph/osd/ceph-14/:总用量 36-rw-r--r-- 1 root root 37 1124 18:21 ceph_fsiddrwxr-xr-x 4 root root 61 11月 24 18:21 current-rw-r--r-- 1 root root 37 1124 18:21 fsid-rw------- 1 root root 57 1124 18:21 keyring-rw-r--r-- 1 root root 21 1124 18:21 magic-rw-r--r-- 1 root root  6 1124 18:21 ready-rw-r--r-- 1 root root  4 1124 18:21 store_version-rw-r--r-- 1 root root 53 1124 18:21 superblock-rw-r--r-- 1 root root 10 1124 18:21 type-rw-r--r-- 1 root root  3 1124 18:21 whoami[root@hh-ceph-128214 /]# chown ceph:ceph -R  /var/lib/ceph/journal/ceph-14/  /var/lib/ceph/osd/ceph-14/

启动 ceph osd

[root@hh-ceph-128214 tmp]# systemctl status ceph-osd@14● ceph-osd@14.service - Ceph object storage daemon osd.14   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; disabled; vendor preset: disabled)   Active: failed (Result: start-limit) since 五 2017-11-24 17:35:00 CST; 1min 51s ago  Process: 106773 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)  Process: 106767 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS) Main PID: 106773 (code=exited, status=1/FAILURE)1124 17:34:40 hh-ceph-128214.vclound.com systemd[1]: Unit ceph-osd@14.service entered failed state.1124 17:34:40 hh-ceph-128214.vclound.com systemd[1]: ceph-osd@14.service failed.1124 17:35:00 hh-ceph-128214.vclound.com systemd[1]: ceph-osd@14.service holdoff time over, scheduling restart.1124 17:35:00 hh-ceph-128214.vclound.com systemd[1]: start request repeated too quickly for ceph-osd@14.service1124 17:35:00 hh-ceph-128214.vclound.com systemd[1]: Failed to start Ceph object storage daemon osd.14.1124 17:35:00 hh-ceph-128214.vclound.com systemd[1]: Unit ceph-osd@14.service entered failed state.1124 17:35:00 hh-ceph-128214.vclound.com systemd[1]: ceph-osd@14.service failed.[root@hh-ceph-128214 tmp]# systemctl start ceph-osd@14Job for ceph-osd@14.service failed because start of the service was attempted too often. See "systemctl status ceph-osd@14.service" and "journalctl -xe" for details.To force a start use "systemctl reset-failed ceph-osd@14.service" followed by "systemctl start ceph-osd@14.service" again.[root@hh-ceph-128214 tmp]# systemctl reset-failed ceph-osd@14[root@hh-ceph-128214 tmp]# systemctl start ceph-osd@14[root@hh-ceph-128214 tmp]# systemctl status ceph-osd@14● ceph-osd@14.service - Ceph object storage daemon osd.14   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; disabled; vendor preset: disabled)   Active: active (running) since 五 2017-11-24 17:37:17 CST; 3s ago  Process: 106871 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS) Main PID: 106877 (ceph-osd)   CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@14.service           └─106877 /usr/bin/ceph-osd -f --cluster ceph --id 14 --setuser ceph --setgroup ceph1124 17:37:17 hh-ceph-128214.vclound.com systemd[1]: Starting Ceph object storage daemon osd.14...1124 17:37:17 hh-ceph-128214.vclound.com systemd[1]: Started Ceph object storage daemon osd.14.1124 17:37:17 hh-ceph-128214.vclound.com ceph-osd[106877]: starting osd.14 at - osd_data /var/lib/ceph/osd/ceph-14 /var/lib/ceph/journal/ceph-14/journal1124 17:37:18 hh-ceph-128214.vclound.com ceph-osd[106877]: 2017-11-24 17:37:18.035052 7fbaaf369d00 -1 journal FileJournal::_open: disabling aio for non-block ...o anyway1124 17:37:18 hh-ceph-128214.vclound.com ceph-osd[106877]: 2017-11-24 17:37:18.047920 7fbaaf369d00 -1 osd.14 0 log_to_monitors {default=true}1124 17:37:18 hh-ceph-128214.vclound.com ceph-osd[106877]: 2017-11-24 17:37:18.054256 7fba96117700 -1 osd.14 0 waiting for initial osdmapHint: Some lines were ellipsized, use -l to show in full.

检测

参考当前 ceph 状态

  cluster:    id:     c45b752d-5d4d-4d3a-a3b2-04e73eff4ccd    health: HEALTH_WARN            Degraded data redundancy: 8965/137559 objects degraded (6.517%), 60 pgs unclean, 206 pgs degraded  services:    mon: 3 daemons, quorum hh-ceph-128040,hh-ceph-128214,hh-ceph-128215    mgr: openstack(active)    osd: 36 osds: 36 up, 36 in    <- 参考这里  data:    pools:   1 pools, 2048 pgs    objects: 45853 objects, 178 GB    usage:   540 GB used, 195 TB / 196 TB avail    pgs:     8965/137559 objects degraded (6.517%)             1842 active+clean             201  active+recovery_wait+degraded             5    active+recovering+degraded  io:    recovery: 168 MB/s, 42 objects/s

参考 osd tree

[root@hh-ceph-128214 ceph]#  ceph osd treeID  CLASS WEIGHT    TYPE NAME                   STATUS REWEIGHT PRI-AFF -1       215.45609 root default-10        71.45609     rack racka07 -3        71.45609         host hh-ceph-128214 12   hdd   6.00000             osd.12              up  1.00000 1.00000 13   hdd   6.00000             osd.13              up  1.00000 1.00000 14   hdd   5.45609             osd.14              up  1.00000 1.00000 15   hdd   6.00000             osd.15              up  1.00000 1.00000 16   hdd   6.00000             osd.16              up  1.00000 1.00000 17   hdd   6.00000             osd.17              up  1.00000 1.00000 18   hdd   6.00000             osd.18              up  1.00000 1.00000 19   hdd   6.00000             osd.19              up  1.00000 1.00000 20   hdd   6.00000             osd.20              up  1.00000 1.00000 21   hdd   6.00000             osd.21              up  1.00000 1.00000 22   hdd   6.00000             osd.22              up  1.00000 1.00000 23   hdd   6.00000             osd.23              up  1.00000 1.00000 -9        72.00000     rack racka12 -2        72.00000         host hh-ceph-128040  0   hdd   6.00000             osd.0               up  1.00000 0.50000  1   hdd   6.00000             osd.1               up  1.00000 1.00000  2   hdd   6.00000             osd.2               up  1.00000 1.00000  3   hdd   6.00000             osd.3               up  1.00000 1.00000

总结

在恢复 data disk 时候, 必须要把故障 osd 移除,  (ceph osd rm osd.14)之前 ceph 0.87 版本在恢复时候不需要执行这个步骤