ceph (luminous 版) journal disk 故障测试
来源:互联网 发布:二叉树遍历递归算法 编辑:程序博客网 时间:2024/05/22 03:51
目的
模拟 ceph (luminous 版) 当 journal disk 磁盘故障解决上述故障
环境
参考手动部署 ceph 环境说明 (luminous 版)
当前 ceph 环境如下
[root@hh-ceph-128040 ~]# ceph -s cluster: id: c45b752d-5d4d-4d3a-a3b2-04e73eff4ccd health: HEALTH_OK services: mon: 3 daemons, quorum hh-ceph-128040,hh-ceph-128214,hh-ceph-128215 mgr: openstack(active) osd: 36 osds: 36 up, 36 in data: pools: 1 pools, 2048 pgs objects: 7 objects, 725 bytes usage: 50607 MB used, 196 TB / 196 TB avail pgs: 2048 active+clean
参考 osd tree
[root@hh-ceph-128040 ~]# ceph osd treeID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 216.00000 root default-10 72.00000 rack racka07 -3 72.00000 host hh-ceph-128214 12 hdd 6.00000 osd.12 up 1.00000 1.00000 13 hdd 6.00000 osd.13 up 1.00000 1.00000 14 hdd 6.00000 osd.14 up 1.00000 1.00000 15 hdd 6.00000 osd.15 up 1.00000 1.00000 16 hdd 6.00000 osd.16 up 1.00000 1.00000 17 hdd 6.00000 osd.17 up 1.00000 1.00000 18 hdd 6.00000 osd.18 up 1.00000 1.00000 19 hdd 6.00000 osd.19 up 1.00000 1.00000 20 hdd 6.00000 osd.20 up 1.00000 1.00000 21 hdd 6.00000 osd.21 up 1.00000 1.00000 22 hdd 6.00000 osd.22 up 1.00000 1.00000 23 hdd 6.00000 osd.23 up 1.00000 1.00000 -9 72.00000 rack racka12 -2 72.00000 host hh-ceph-128040 0 hdd 6.00000 osd.0 up 1.00000 0.50000 1 hdd 6.00000 osd.1 up 1.00000 1.00000 2 hdd 6.00000 osd.2 up 1.00000 1.00000 3 hdd 6.00000 osd.3 up 1.00000 1.00000 4 hdd 6.00000 osd.4 up 1.00000 1.00000 5 hdd 6.00000 osd.5 up 1.00000 1.00000 6 hdd 6.00000 osd.6 up 1.00000 1.00000 7 hdd 6.00000 osd.7 up 1.00000 1.00000 8 hdd 6.00000 osd.8 up 1.00000 1.00000 9 hdd 6.00000 osd.9 up 1.00000 1.00000 10 hdd 6.00000 osd.10 up 1.00000 1.00000 11 hdd 6.00000 osd.11 up 1.00000 1.00000-11 72.00000 rack rackb08 -4 72.00000 host hh-ceph-128215 24 hdd 6.00000 osd.24 up 1.00000 1.00000 25 hdd 6.00000 osd.25 up 1.00000 1.00000 26 hdd 6.00000 osd.26 up 1.00000 1.00000 27 hdd 6.00000 osd.27 up 1.00000 1.00000 28 hdd 6.00000 osd.28 up 1.00000 1.00000 29 hdd 6.00000 osd.29 up 1.00000 1.00000 30 hdd 6.00000 osd.30 up 1.00000 1.00000 31 hdd 6.00000 osd.31 up 1.00000 1.00000 32 hdd 6.00000 osd.32 up 1.00000 1.00000 33 hdd 6.00000 osd.33 up 1.00000 1.00000 34 hdd 6.00000 osd.34 up 1.00000 1.00000 35 hdd 6.00000 osd.35 up 1.00000 1.00000
故障模拟
模拟 ceph osd 14 journal disk 磁盘故障
[root@hh-ceph-128214 ~]# df -h | grep ceph-14/dev/sdn3 4.7G 2.1G 2.7G 44% /var/lib/ceph/journal/ceph-14/dev/sdc1 5.5T 2.8G 5.5T 1% /var/lib/ceph/osd/ceph-14[root@hh-ceph-128214 ~]# dd if=/dev/zero of=/dev/sdn3 bs=1M count=100记录了10+0 的读入记录了10+0 的写出10485760字节(10 MB)已复制,0.00888055 秒,1.2 GB/秒
参考故障信息
Every 2.0s: ceph -s Fri Nov 24 16:18:05 2017 cluster: id: c45b752d-5d4d-4d3a-a3b2-04e73eff4ccd health: HEALTH_WARN 1 osds down Degraded data redundancy: 3635/84072 objects degraded (4.324%), 155 pgs unclean, 155 pgs degraded, 155 pgs undersized services: mon: 3 daemons, quorum hh-ceph-128040,hh-ceph-128214,hh-ceph-128215 mgr: openstack(active) osd: 36 osds: 35 up, 36 in data: pools: 1 pools, 2048 pgs objects: 28024 objects, 109 GB usage: 343 GB used, 196 TB / 196 TB avail pgs: 3635/84072 objects degraded (4.324%) 1893 active+clean 155 active+undersized+degraded
参考 mon.log 中的警告信息
2017-11-24 16:09:24.996545 7fdd215c1700 0 log_channel(cluster) log [DBG] : osd.14 10.199.128.214:6804/11943 reported immediately failed by osd.6 10.199.128.40:6812/123172017-11-24 16:09:25.083523 7fdd23dc6700 0 log_channel(cluster) log [WRN] : Health check failed: 1 osds down (OSD_DOWN)2017-11-24 16:09:25.087241 7fdd1cdb8700 1 mon.hh-ceph-128040@0(leader).log v17642 check_sub sending message to client.94503 10.199.128.40:0/161437639 with 1 entries (version 17642)2017-11-24 16:09:25.093344 7fdd1cdb8700 1 mon.hh-ceph-128040@0(leader).osd e329 e329: 36 total, 35 up, 36 in2017-11-24 16:09:25.093857 7fdd1cdb8700 0 log_channel(cluster) log [DBG] : osdmap e329: 36 total, 35 up, 36 in
参考相应的 OSD TREE 信息
[root@hh-ceph-128040 ~]# ceph osd treeID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 216.00000 root default-10 72.00000 rack racka07 -3 72.00000 host hh-ceph-128214 12 hdd 6.00000 osd.12 up 1.00000 1.00000 13 hdd 6.00000 osd.13 up 1.00000 1.00000 14 hdd 6.00000 osd.14 down 0 1.00000 15 hdd 6.00000 osd.15 up 1.00000 1.00000 16 hdd 6.00000 osd.16 up 1.00000 1.00000 17 hdd 6.00000 osd.17 up 1.00000 1.00000 18 hdd 6.00000 osd.18 up 1.00000 1.00000 19 hdd 6.00000 osd.19 up 1.00000 1.00000 20 hdd 6.00000 osd.20 up 1.00000 1.00000 21 hdd 6.00000 osd.21 up 1.00000 1.00000 22 hdd 6.00000 osd.22 up 1.00000 1.00000 23 hdd 6.00000 osd.23 up 1.00000 1.00000
恢复方法
恢复 journal disk 文件系统格式
[root@hh-ceph-128214 ceph]# umount /var/lib/ceph/journal/ceph-14[root@hh-ceph-128214 ceph]# mkfs -t xfs /dev/sdn3meta-data=/dev/sdn3 isize=256 agcount=4, agsize=305152 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0data = bsize=4096 blocks=1220608, imaxpct=25 = sunit=0 swidth=0 blksnaming =version 2 bsize=4096 ascii-ci=0 ftype=0log =internal log bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1realtime =none extsz=4096 blocks=0, rtextents=0[root@hh-ceph-128214 ceph]# mount /dev/sdn3 /var/lib/ceph/journal/ceph-14
重新对 journal 进行初始化
[root@hh-ceph-128214 ceph]# ceph-osd -i 14 --mkjournal 2017-11-24 16:30:14.559300 7fb026cbbd00 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway2017-11-24 16:30:14.560759 7fb026cbbd00 -1 created new journal /var/lib/ceph/journal/ceph-14/journal for object store /var/lib/ceph/osd/ceph-14
切记修改 journal 文件属性
[root@hh-ceph-128214 ceph]# ls -l /var/lib/ceph/journal/ceph-14/总用量 2097152-rw-r--r-- 1 root root 2147483648 11月 24 16:30 journal[root@hh-ceph-128214 ceph]# chown ceph:ceph /var/lib/ceph/journal/ceph-14/journal
重启服务
[root@hh-ceph-128214 ceph]# systemctl start ceph-osd@14[root@hh-ceph-128214 ceph]# systemctl status ceph-osd@14● ceph-osd@14.service - Ceph object storage daemon osd.14 Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; disabled; vendor preset: disabled) Active: active (running) since 五 2017-11-24 16:43:54 CST; 8s ago Process: 105065 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS) Main PID: 105072 (ceph-osd) CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@14.service └─105072 /usr/bin/ceph-osd -f --cluster ceph --id 14 --setuser ceph --setgroup ceph11月 24 16:43:54 hh-ceph-128214.vclound.com systemd[1]: Starting Ceph object storage daemon osd.14...11月 24 16:43:54 hh-ceph-128214.vclound.com systemd[1]: Started Ceph object storage daemon osd.14.11月 24 16:43:54 hh-ceph-128214.vclound.com ceph-osd[105072]: starting osd.14 at - osd_data /var/lib/ceph/osd/ceph-14 /var/lib/ceph/journal/ceph-14/journal11月 24 16:43:54 hh-ceph-128214.vclound.com ceph-osd[105072]: 2017-11-24 16:43:54.530588 7fc9f6aa6d00 -1 journal FileJournal::_open: disabling aio for non-block ...o anyway11月 24 16:43:54 hh-ceph-128214.vclound.com ceph-osd[105072]: 2017-11-24 16:43:54.677272 7fc9f6aa6d00 -1 osd.14 328 log_to_monitors {default=true}Hint: Some lines were ellipsized, use -l to show in full.
参考 ceph 状态
Every 2.0s: ceph -s Fri Nov 24 16:46:26 2017 cluster: id: c45b752d-5d4d-4d3a-a3b2-04e73eff4ccd health: HEALTH_WARN Degraded data redundancy: 123/84072 objects degraded (0.146%), 15 pgs unclean, 15 pgs degraded services: mon: 3 daemons, quorum hh-ceph-128040,hh-ceph-128214,hh-ceph-128215 mgr: openstack(active) osd: 36 osds: 36 up, 36 in data: pools: 1 pools, 2048 pgs objects: 28024 objects, 109 GB usage: 331 GB used, 196 TB / 196 TB avail pgs: 123/84072 objects degraded (0.146%) 2033 active+clean 14 active+recovery_wait+degraded 1 active+recovering+degraded io: recovery: 30992 kB/s, 7 objects/s
参考 osd tree
[root@hh-ceph-128214 ceph]# ceph osd treeID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 216.00000 root default-10 72.00000 rack racka07 -3 72.00000 host hh-ceph-128214 12 hdd 6.00000 osd.12 up 1.00000 1.00000 13 hdd 6.00000 osd.13 up 1.00000 1.00000 14 hdd 6.00000 osd.14 up 1.00000 1.00000 15 hdd 6.00000 osd.15 up 1.00000 1.00000 16 hdd 6.00000 osd.16 up 1.00000 1.00000 17 hdd 6.00000 osd.17 up 1.00000 1.00000 18 hdd 6.00000 osd.18 up 1.00000 1.00000 19 hdd 6.00000 osd.19 up 1.00000 1.00000 20 hdd 6.00000 osd.20 up 1.00000 1.00000 21 hdd 6.00000 osd.21 up 1.00000 1.00000 22 hdd 6.00000 osd.22 up 1.00000 1.00000 23 hdd 6.00000 osd.23 up 1.00000 1.00000
阅读全文
0 0
- ceph (luminous 版) journal disk 故障测试
- ceph (luminous 版) data disk 故障测试
- ceph (luminous 版) 用户管理
- ceph (luminous 版) pool 管理
- ceph (luminous 版) zabbix 监控
- 手动部署 ceph 环境说明 (luminous 版)
- 手动部署 ceph mon (luminous 版)
- 手动部署 ceph osd (luminous 版)
- 手动部署 ceph mgr (luminous 版)
- ceph (luminous 版) crush map 管理
- ceph (luminous 版) primary affinity 管理
- ceph Luminous dashboard初探
- Ceph安装指南 Luminous版本
- ceph - 更改 ceph journal 位置
- 更改 ceph journal 位置
- Ceph v12.1.0 Luminous RC released
- ceph Luminous新功能之crush class
- 【分析】Ceph and RBD Mirroring:Luminous
- 23种设计模式全解析
- <学习html>第八天笔记-HTML5文档类型和字符集、HTML5新标签与特性(常用新标签、新增input type属性值、常用新属性、多媒体标签)
- jmeter应用---测试元件介绍(三)
- 用go的goroutine和channel实现一个简单的“生产、消费”(带有超时控制)小例子
- Java程序员们最常犯的10个错误
- ceph (luminous 版) journal disk 故障测试
- 正则表达式常用对象方法整理
- Linux中进程间通信——共享内存
- Android Studio 启动项目报错 Warning:Uninstalling will remove the application data!
- main函数一定要有返回值吗?
- linux rsync 使用说明
- Splunk 操作系统App和Add-on整理总结
- 习题3-10 盒子(Box, ACM/ICPC NEERC 2004, UVa1587)
- C++中友元的理解