一次IBM 服务器的磁盘故障更换过程
来源:互联网 发布:腰带软件 过安检 编辑:程序博客网 时间:2024/05/13 12:49
服务器是IBM的X3650 ,5块硬盘RAID 5,1块Hot Spare (但是估计当时装机的人没有配置成hot spare,后面再看详细内容)
故障时的磁盘状态:
[root@serv1 cmdline]# ./arcconf GETCONFIG 1
Could not open log file: UcliEvt.log
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
Controller Status : Okay
Channel description : SAS/SATA
Controller Model : IBM ServeRAID 8k
Controller Serial Number : 40703B9
Physical Slot : 0
Installed memory : 256 MB
Copyback : Disabled
Data scrubbing : Enabled
Defunct disk drive count : 1
Logical drives/Offline/Critical : 1/0/1
--------------------------------------------------------
Controller Version Information
--------------------------------------------------------
BIOS : 5.2-0 (15421)
Firmware : 5.2-0 (15421)
Driver : 1.1-5 (2453)
Boot Flash : 5.1-0 (15411)
--------------------------------------------------------
Controller Battery Information
--------------------------------------------------------
Status : Okay
Over temperature : No
Capacity remaining : 100 percent
Time remaining (at current draw) : 4 days, 5 hours, 20 minutes
--------------------------------------------------------
Controller Vital Product Data
--------------------------------------------------------
VPD Assigned# : 39R8875
EC Version# : J85096
Controller FRU# : 25R8076
Battery FRU# : 25R8088
----------------------------------------------------------------------
Logical drive information
----------------------------------------------------------------------
Logical drive number 1
Logical drive name :
RAID level : 5
Status of logical drive : Critical
Size : 1143986 MB
Read-cache mode : Enabled
Write-cache mode : Enabled (write-back)
Write-cache setting : Enabled (write-back) when protected by battery
Partitioned : Yes
Number of segments : 5
Stripe-unit size : 256 KB
Stripe order (Channel,Device) : DDD 0,1 0,2 0,3 0,4
Defunct segments : 0,0
Defunct stripes : No
----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #0
Device is a Hard drive
State : Defunct
Supported : Yes
Transfer Speed : Defunct
Reported Channel,Device : 0,0
Reported Location : Enclosure 0, Slot 0
Reported ESD : 2,0
Vendor : IBM-ESXS
Model : MBA3300RC
Firmware : SA06
Serial number : BJ504MHL
World-wide name : 500000E01F7F8CF1
Size : 0 MB
Write Cache : Unknown
FRU : 43X0817
PFA : Yes
Device #1
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,1
Reported Location : Enclosure 0, Slot 1
Reported ESD : 2,0
Vendor : IBM-ESXS
Model : MBA3300RC
Firmware : SA06
Serial number : BJ504M6N
World-wide name : 500000E01F7DC3D1
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : 43X0817
PFA : No
Device #2
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,2
Reported Location : Enclosure 0, Slot 2
Reported ESD : 2,0
Vendor : IBM-ESXS
Model : MBA3300RC
Firmware : SA06
Serial number : BJ504M8J
World-wide name : 500000E01F7DCE11
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : 43X0817
PFA : No
Device #3
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,3
Reported Location : Enclosure 0, Slot 3
Reported ESD : 2,0
Vendor : IBM-ESXS
Model : MBA3300RC
Firmware : SA06
Serial number : BJ504M5C
World-wide name : 500000E01F7DBEE1
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : 43X0817
PFA : No
Device #4
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,4
Reported Location : Enclosure 0, Slot 4
Reported ESD : 2,0
Vendor : IBM-ESXS
Model : MBA3300RC
Firmware : SA06
Serial number : BJ504JWK
World-wide name : 500000E01F6EA341
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : 43X0817
PFA : Yes
Device #5
Device is a Hard drive
State : Ready
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,5
Reported Location : Enclosure 0, Slot 5
Reported ESD : 2,0
Vendor : IBM-ESXS
Model : MBA3300RC
Firmware : SA06
Serial number : BJ504MKL
World-wide name : 500000E01F812561
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : 43X0817
PFA : No
Device #6
Device is an Enclosure services device
Reported Channel,Device : 2,0
Enclosure ID : 0
Type : SES2
Vendor : IBM-ESXS
Model : VSC7160
Firmware : 1.07
Status of Enclosure services device
Temperature : Normal
从上面的状态可以看到Device #0 状态是Defunct 表示不可用,也就是挂了。预期的Device #5应该是hot spare,在这个时候应该顶替上去先rebuilding然后变为online的。但是估计是当时安装的时候没有配置好,Device #0挂了以后居然没有顶替上去。
现场约了IBM售后工程师后,IBM 工程师到现场看发现服务器上的硬盘灯没有变红或者变黄。也就是说灯坏了。。
再他更换了Device #0 的硬盘后:
Logical drive information
----------------------------------------------------------------------
Logical drive number 1
Logical drive name :
RAID level : 5
Status of logical drive : Critical
Size : 1143986 MB
Read-cache mode : Enabled
Write-cache mode : Enabled (write-back)
Write-cache setting : Enabled (write-back) when protected by battery
Partitioned : Yes
Number of segments : 5
Stripe-unit size : 256 KB
Stripe order (Channel,Device) : 0,0 0,1 0,2 0,3 0,4
Defunct segments : No
Defunct stripes : No
Device #0
Device is a Hard drive
State : Rebuilding
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,0
Reported Location : Enclosure 0, Slot 0
Reported ESD : 2,0
Vendor : IBM-ESXS
Model : ST3300655SS
Firmware : BA2D
Serial number : 3LM0LE9Z
World-wide name : 5000C5001C8401B0
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : 43X0805
PFA : No
把Device #5 设置为hot spare
[root@serv1 cmdline]# ./arcconf SETSTATE 1 DEVICE 0 5 HSP
然后过了3个小时后再看:
[root@serv1 cmdline]# ./arcconf GETCONFIG 1
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
Controller Status : Okay
Channel description : SAS/SATA
Controller Model : IBM ServeRAID 8k
Controller Serial Number : 40703B9
Physical Slot : 0
Installed memory : 256 MB
Copyback : Disabled
Data scrubbing : Enabled
Defunct disk drive count : 0
Logical drives/Offline/Critical : 1/0/0
--------------------------------------------------------
Controller Version Information
--------------------------------------------------------
BIOS : 5.2-0 (15421)
Firmware : 5.2-0 (15421)
Driver : 1.1-5 (2453)
Boot Flash : 5.1-0 (15411)
--------------------------------------------------------
Controller Battery Information
--------------------------------------------------------
Status : Okay
Over temperature : No
Capacity remaining : 100 percent
Time remaining (at current draw) : 4 days, 5 hours, 20 minutes
--------------------------------------------------------
Controller Vital Product Data
--------------------------------------------------------
VPD Assigned# : 39R8875
EC Version# : J85096
Controller FRU# : 25R8076
Battery FRU# : 25R8088
----------------------------------------------------------------------
Logical drive information
----------------------------------------------------------------------
Logical drive number 1
Logical drive name :
RAID level : 5
Status of logical drive : Okay
Size : 1143986 MB
Read-cache mode : Enabled
Write-cache mode : Enabled (write-back)
Write-cache setting : Enabled (write-back) when protected by battery
Partitioned : Yes
Number of segments : 5
Stripe-unit size : 256 KB
Stripe order (Channel,Device) : 0,0 0,1 0,2 0,3 0,4
Defunct segments : No
Defunct stripes : No
----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #0
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,0
Reported Location : Enclosure 0, Slot 0
Reported ESD : 2,0
Vendor : IBM-ESXS
Model : ST3300655SS
Firmware : BA2D
Serial number : 3LM0LE9Z
World-wide name : 5000C5001C8401B0
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : 43X0805
PFA : No
Device #1
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,1
Reported Location : Enclosure 0, Slot 1
Reported ESD : 2,0
Vendor : IBM-ESXS
Model : MBA3300RC
Firmware : SA06
Serial number : BJ504M6N
World-wide name : 500000E01F7DC3D1
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : 43X0817
PFA : No
Device #2
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,2
Reported Location : Enclosure 0, Slot 2
Reported ESD : 2,0
Vendor : IBM-ESXS
Model : MBA3300RC
Firmware : SA06
Serial number : BJ504M8J
World-wide name : 500000E01F7DCE11
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : 43X0817
PFA : No
Device #3
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,3
Reported Location : Enclosure 0, Slot 3
Reported ESD : 2,0
Vendor : IBM-ESXS
Model : MBA3300RC
Firmware : SA06
Serial number : BJ504M5C
World-wide name : 500000E01F7DBEE1
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : 43X0817
PFA : No
Device #4
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,4
Reported Location : Enclosure 0, Slot 4
Reported ESD : 2,0
Vendor : IBM-ESXS
Model : MBA3300RC
Firmware : SA06
Serial number : BJ504JWK
World-wide name : 500000E01F6EA341
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : 43X0817
PFA : Yes
Device #5
Device is a Hard drive
State : Hot Spare
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,5
Reported Location : Enclosure 0, Slot 5
Reported ESD : 2,0
Vendor : IBM-ESXS
Model : MBA3300RC
Firmware : SA06
Serial number : BJ504MKL
World-wide name : 500000E01F812561
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : 43X0817
PFA : No
Device #6
Device is an Enclosure services device
Reported Channel,Device : 2,0
Enclosure ID : 0
Type : SES2
Vendor : IBM-ESXS
Model : VSC7160
Firmware : 1.07
Status of Enclosure services device
Temperature : Normal
Command completed successfully.
希望下次再有硬盘挂的时候hotspare 能顶替上去。
- 一次IBM 服务器的磁盘故障更换过程
- 一次ibm服务器故障处理
- IBM X3650 服务器更换内存的过程记录
- 一次RAC共享磁盘映射问题导致RAC异常重启的故障处理过程
- IBM DS5020存储更换磁盘
- centos 6.4 磁盘故障 更换磁盘
- 记录一次设备更换的过程
- 一次mysql slave故障的解决过程
- 一次由于开机磁盘自检导致的启动故障排查
- 一次RAC环境下服务器故障重启后ORACLE启动过程
- 记一次troubleshoot 磁盘满的过程
- 磁盘损坏后,更换AIX磁盘过程
- 【故障处理】一次RAC故障处理过程
- 磁盘控制器故障导致服务器无法读写的处理方法
- 使用storcli点亮服务器上的故障磁盘
- 服务器磁盘故障raid崩溃的数据恢复案例分析
- 一次Oracle故障处理过程
- 记一次服务器故障处理
- iOS-动态调整UITableViewCell的高度
- java 转整型 哪种方法速度最快?
- ASP.NET MVC 重点教程一周年版 第十一回 母版页、用户自定义控件及文件上传
- 2011年11月知识小总结
- Java中Map类
- 一次IBM 服务器的磁盘故障更换过程
- oracle之隐式游标和ref游标总结
- linux 2.6.x Makefile
- 索引和长度必须引用该字符串内的位置。参数名: length
- linux系统性能调优第一步——性能分析(vmstat)
- 观察者模式
- dataguard
- 突然有爆粗口的冲动!!!
- IP-二进制数分析