Nagios自己编写监控磁盘脚本check_disk
来源:互联网 发布:魔灵召唤淘宝刷塔 编辑:程序博客网 时间:2024/05/16 18:18
不知不觉已经实习了一个月了,实习期间做的主要工作就是搭建Nagios+Centreon监控平台了,自己动手还是比较快的,搭这个东西虽然bug一堆,但还算顺利,后来就开始自行编写监控磁盘的脚本了。
先说一下为什么要自己编写监控磁盘的脚本,其实,我自己也不是太清楚,因为Nagios-plugins里面是有check_disk的脚本的,可能我的导师是想锻炼一下我,同时也为了有一个更符合自己实际情况的脚本。
面对的硬件有:三台服务器搭建测试云平台,两台服务器上有RAID卡,两台服务器上有SSD,还有HDD若干。对的,只有这么点,但对于我这个小菜鸟,也够我折腾了。
对于有RAID卡的主机,MegaCli就是个不错的选择了,自行下载安装MegaCli,然后就动手了:
/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL ---查raid/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL ---查raid卡信息/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL ---查看硬盘信息
自己弄着弄着玩一下,观察一下显示的东西,显示出来的东西有很大一片的,随便看看。如果该主机本身没有RAID卡,那你在它上面使用MegaCli的话,显示的就只有 Exit Code: 0x00
主要用的是第三条命令/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL
然后抓取我要的信息/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | grep -E 'Device Id|Error|Media Type'
Device Id — 监控SSD寿命的时候用到,就是一个Id而已
Error — Error Count 就是我们要观察的错误信息了,为0就是木有错误,不为0就要担心了
Media Type — 硬盘类型,主要是我要找主机面的SSD对应的是哪个Device Id,因为除了这样,我也不知道Device Id跟硬盘或者跟分区有什么对应关系,贴一下我显示的结果:
[root@cloud-13 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | grep -E 'Device Id|Error|Media Type'Device Id: 0Media Error Count: 0Other Error Count: 0Media Type: Hard Disk DeviceDevice Id: 1Media Error Count: 0Other Error Count: 0Media Type: Hard Disk DeviceDevice Id: 2Media Error Count: 0Other Error Count: 0Media Type: Hard Disk DeviceDevice Id: 3Media Error Count: 0Other Error Count: 0Media Type: Hard Disk DeviceDevice Id: 4Media Error Count: 0Other Error Count: 0Media Type: Solid State Device
这样,自行写代码观察Error Count后面的数值就行了,就达到监控的效果了。
刚刚有提到SSD寿命的问题,在这一并说了吧,使用smartctl可以检测SSD的寿命,当然还有很多其它结果,SSD寿命只是其中一部分,但是对于有RAID卡的主机,需要刚刚获取到的Device Id。
[root@cloud-13 ~]# smartctl -a -d megaraid,4 /dev/sdc1smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.123.2.openstack.el6.x86_64] (local build)Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net/dev/sdc1 [megaraid_disk_04] [SAT]: Device open changed type from 'megaraid' to 'sat'Smartctl open device: /dev/sdc1 [megaraid_disk_04] [SAT] failed: SATA device detected,MegaRAID SAT layer is reportedly buggy, use '-d sat+megaraid,N' to try anyhow
我的主机上需要我加上sat,就听他话咯
[root@cloud-13 ~]# smartctl -a -d megaraid,4 /dev/sdc1smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.123.2.openstack.el6.x86_64] (local build)Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net/dev/sdc1 [megaraid_disk_04] [SAT]: Device open changed type from 'megaraid' to 'sat'Smartctl open device: /dev/sdc1 [megaraid_disk_04] [SAT] failed: SATA device detected,MegaRAID SAT layer is reportedly buggy, use '-d sat+megaraid,N' to try anyhow[root@cloud-13 ~]# smartctl -a -d sat+megaraid,4 /dev/sdc1smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.123.2.openstack.el6.x86_64] (local build)Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net=== START OF INFORMATION SECTION ===Device Model: OCZ INTREPID 3600Serial Number: A21N8061423000004LU WWN Device Id: 5 e83a97 100006dc5Firmware Version: 1.4.6.0User Capacity: 800,166,076,416 bytes [800 GB]Sector Size: 512 bytes logical/physicalDevice is: Not in smartctl database [for details use: -P showall]ATA Version is: 8ATA Standard is: ACS-2 (revision not indicated)Local Time is: Tue Aug 25 15:20:02 2015 CSTSMART support is: Available - device has SMART capability.SMART support is: Enabled=== START OF READ SMART DATA SECTION ===SMART overall-health self-assessment test result: PASSEDWarning: This result is based on an Attribute check.General SMART Values:Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled.Self-test execution status: ( 249) Self-test routine in progress... 90% of test remaining.Total time to complete Offline data collection: ( 0) seconds.Offline data collectioncapabilities: (0x1d) SMART execute Offline immediate. No Auto Offline data collection support. Abort Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported.SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer.Error logging capability: (0x00) Error logging NOT supported. General Purpose Logging supported.Short self-test routine recommended polling time: ( 0) minutes.Extended self-test routinerecommended polling time: ( 0) minutes.SMART Attributes Data Structure revision number: 18Vendor Specific SMART Attributes with Thresholds:ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0000 100 100 000 Old_age Offline - 0 9 Power_On_Hours 0x0000 100 100 000 Old_age Offline - 3964 12 Power_Cycle_Count 0x0000 100 100 000 Old_age Offline - 28100 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 2547072171 Unknown_Attribute 0x0000 090 000 000 Old_age Offline - 12030174 Unknown_Attribute 0x0000 071 100 000 Old_age Offline - 20184 End-to-End_Error 0x0000 009 100 000 Old_age Offline - 1282187 Reported_Uncorrect 0x0000 100 100 000 Old_age Offline - 0190 Airflow_Temperature_Cel 0x0000 048 054 000 Old_age Offline - 48195 Hardware_ECC_Recovered 0x0000 000 100 000 Old_age Offline - 0196 Reallocated_Event_Count 0x0000 000 100 000 Old_age Offline - 0197 Current_Pending_Sector 0x0000 000 100 000 Old_age Offline - 0198 Offline_Uncorrectable 0x0000 100 100 000 Old_age Offline - 3562199 UDMA_CRC_Error_Count 0x0000 100 100 000 Old_age Offline - 3443202 Data_Address_Mark_Errs 0x0000 100 100 000 Old_age Offline - 2061332509205 Thermal_Asperity_Rate 0x0000 100 100 000 Old_age Offline - 3000206 Flying_Height 0x0000 000 100 000 Old_age Offline - 0207 Spin_High_Current 0x0000 002 100 000 Old_age Offline - 64208 Spin_Buzz 0x0000 000 100 000 Old_age Offline - 9210 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0211 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0212 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0213 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0214 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0221 G-Sense_Error_Rate 0x0000 100 100 000 Old_age Offline - 0222 Loaded_Hours 0x0000 100 100 000 Old_age Offline - 0230 Head_Amplitude 0x0000 001 100 000 Old_age Offline - 1233 Media_Wearout_Indicator 0x0000 100 000 000 Old_age Offline - 100249 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 5792251 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 22849SMART Error Log not supportedWarning! SMART Self-Test Log Structure error: invalid SMART checksum.SMART Self-test log structure revision number 1No self-tests have been logged. [To run self-tests, use: smartctl -t]Device does not support Selective Self Tests/Logging
然后抓取这个就行了,那个100就是表示寿命还剩100%,就是一点都没损耗,毕竟是新的呢 233 Media_Wearout_Indicator 0x0000 100 000 000 Old_age Offline - 100
我也都是参照下面这两个博客做的,他们说得很详细
http://blog.yufeng.info/archives/1096
http://www.woxihuan.com/117417/1336095005082619.shtml
对于没有RAID卡的主机,smartctl可以很好的用来检测磁盘是否有错误 # smartctl -a /dev/sdx
显示所有信息sdx为自己电脑分区
因为我只要观察Error Count log,可以使用这个: # smartctl -l error /dev/sdc
则只列出Error Counter
[root@cloud-11 ~]# smartctl -l error /dev/sdcsmartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.123.2.openstack.el6.x86_64] (local build)Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.netError counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errorsread: 0 0 0 0 20680 755.998 0write: 0 0 0 0 8177 1356.647 0verify: 0 0 0 0 760 61.354 0Non-medium error count: 0
观察带error的列,为0则是木有问题,实现代码抓取就行了
对于这台没有RAID卡的主机,使用smartctl检测ssd的时候,是没有Error Counter log的
[root@cloud-11 ~]# smartctl -a /dev/sdbsmartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.123.2.openstack.el6.x86_64] (local build)Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net=== START OF INFORMATION SECTION ===Device Model: OCZ INTREPID 3600Serial Number: A21N8061423000020LU WWN Device Id: 5 e83a97 100006dd5Firmware Version: 1.4.6.0User Capacity: 800,166,076,416 bytes [800 GB]Sector Size: 512 bytes logical/physicalDevice is: Not in smartctl database [for details use: -P showall]ATA Version is: 8ATA Standard is: ACS-2 (revision not indicated)Local Time is: Tue Aug 25 15:34:29 2015 CSTSMART support is: Available - device has SMART capability.SMART support is: Enabled=== START OF READ SMART DATA SECTION ===SMART overall-health self-assessment test result: PASSEDGeneral SMART Values:Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled.Self-test execution status: ( 25) The self-test routine was aborted by the host.Total time to complete Offline data collection: ( 0) seconds.Offline data collectioncapabilities: (0x1d) SMART execute Offline immediate. No Auto Offline data collection support. Abort Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported.SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer.Error logging capability: (0x00) Error logging NOT supported. General Purpose Logging supported.Short self-test routine recommended polling time: ( 0) minutes.Extended self-test routinerecommended polling time: ( 0) minutes.SMART Attributes Data Structure revision number: 18Vendor Specific SMART Attributes with Thresholds:ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0000 100 100 000 Old_age Offline - 0 9 Power_On_Hours 0x0000 100 100 000 Old_age Offline - 5116 12 Power_Cycle_Count 0x0000 100 100 000 Old_age Offline - 12100 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 4009824171 Unknown_Attribute 0x0000 090 000 000 Old_age Offline - 12041174 Unknown_Attribute 0x0000 066 100 000 Old_age Offline - 8184 End-to-End_Error 0x0000 009 100 000 Old_age Offline - 1271187 Reported_Uncorrect 0x0000 100 100 000 Old_age Offline - 0190 Airflow_Temperature_Cel 0x0000 045 063 000 Old_age Offline - 45195 Hardware_ECC_Recovered 0x0000 000 100 000 Old_age Offline - 0196 Reallocated_Event_Count 0x0000 000 100 000 Old_age Offline - 0197 Current_Pending_Sector 0x0000 000 100 000 Old_age Offline - 0198 Offline_Uncorrectable 0x0000 100 100 000 Old_age Offline - 2732199 UDMA_CRC_Error_Count 0x0000 100 100 000 Old_age Offline - 2458202 Data_Address_Mark_Errs 0x0000 100 100 000 Old_age Offline - 2371926836205 Thermal_Asperity_Rate 0x0000 100 100 000 Old_age Offline - 3000206 Flying_Height 0x0000 000 100 000 Old_age Offline - 0207 Spin_High_Current 0x0000 003 100 000 Old_age Offline - 90208 Spin_Buzz 0x0000 000 100 000 Old_age Offline - 14210 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 9175211 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0212 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0213 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0214 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0221 G-Sense_Error_Rate 0x0000 100 100 000 Old_age Offline - 0222 Loaded_Hours 0x0000 100 100 000 Old_age Offline - 0230 Head_Amplitude 0x0000 001 100 000 Old_age Offline - 1233 Media_Wearout_Indicator 0x0000 100 000 000 Old_age Offline - 100249 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 7079251 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 20961SMART Error Log not supportedSMART Self-test log structure revision number 1Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error# 1 Extended offline Aborted by host 90% 0 -# 2 Short offline Aborted by host 90% 0 -Device does not support Selective Self Tests/Logging
但却是有SSD的寿命的: 233 Media_Wearout_Indicator 0x0000 100 000 000 Old_age Offline - 100
找了很久,对这块没有RAID的SSD的错误检测依旧没有办法,只能监控其寿命,要是哪位高人有办法,请指教。
至此就实现得差不多了,总体思路就是如此:
通过检测工具
对于没有使用raid卡的硬盘,可以用smartctl -a /dev/sdX 观察Error counter log的列的值有没有增加;
使用raid卡的硬盘,则用MegaCli来观察Error Count
最后就是对ioerr_cnt的研究了,操作系统为redhat5.x,具体版本不记得了,可以用df -h
来查看磁盘分区情况
对于每一块磁盘,其目录下都会有这个文件,里面存放了一个值
# cat /sys/block/sdb/device/ioerr_cnt 0x1494
从ioerr_cnt这个名字就觉得这个应该是对IO错误的计数,那么它的值就表示发生的IO错误数,0x1494,这可不是一个很低的值,它是否象征着磁盘错误?
而后导师在redhat社区找了一篇关于这个问题的讨论文章给我看,有兴趣的可自行去红帽社区找,我这里不方便提供
[Troubleshooting] How do I determine which io are causing ioerr_cnt to increase?
而这篇文章的存在就是为了确定是哪个IO发生了错误提供寻找办法,就是提出一个解决办法去找到是哪个IO导致错误,但是就算找到了,跟磁盘的健康状态有关系吗?或者说,只是某个进程发生了IO错误,如果这是那个进程本身的关系,那就跟磁盘毫不相干了。
我观察了我三台主机,9块磁盘的ioerr_cnt,发现只有一块硬盘的ioerr_cnt值为0,但是smartctl和MegaCli显示的error都为0。
最后决定放弃对ioerr_cnt的检测,毕竟它并不能全部和磁盘的健康状态挂钩,所以把MegaCli和smartctl作为标准。
这样写下来,总觉得好少,可是自己也将近做了一星期的研究,还要加上好几天的写代码,全部用Python实现的,因为对Python也生疏了好久,查了好久的函数怎么怎么用。但自己收获还是很大的,之前对nagios的脚本还一直抱有敬畏的心态(有一些打开全是乱码),现在发现其实还蛮简单的,主要还是要挑对工具,接着大多数都是字符串处理了,Python是个好东西。
最后的代码如下了,挺简单的,没什么含金量:
#!/usr/bin/env python# -*- coding: utf-8 -*-## Description:# This application is used to discovery the pyhsical disk by using the MegaCLI tool.## Author: Jiang Chuan <806692341@qq.com>#import commandsimport osimport sysimport stringimport argparseSMARTCTL = 'smartctl'ListError = '-l error'DISK = '/dev/sdc'LSPCI = 'lspci | grep -i raid'MEGACLI = '/opt/MegaRAID/MegaCli/MegaCli64'PDLIST = '-PDList -aALL'DEVICE = '|grep \'Device Id\''ERROR = '|grep Error'# nagios exit codeSTATUS_OK = 0STATUS_WARNING = 1STATUS_ERROR = 2STATUS_UNKNOWN = 3def check_smartctl(): (status, output) = commands.getstatusoutput('%s %s %s' % (SMARTCTL, ListError, DISK)) line = output.split('\n') if status != 0: print 'UNKNOWN|Something not unexpected happened:' + line[3] return STATUS_UNKNOWN else: num = [0,1,2,3,4] str_read = '' str_write = '' str_verify = '' for item in line: if item.find("read") in num: str_read = item if item.find("write") in num: str_write = item if item.find("verify") in num: str_verify = item if str_read != '' and str_write != '' and str_verify != '': error_list = [max_error(str_read), max_error(str_write), max_error(str_verify)] if max(error_list) >= 5: print 'ERROR|There is too much error:' + str(error_list) + ' >= 5' return STATUS_ERROR elif max(error_list) == 0: print 'OK' return STATUS_OK else: print 'WARNING|There is some error need handle:' + str(error_list) + '< 5' return STATUS_WARNING else: print 'UNKNOWN|We can not get the error count,please check' return STATUS_UNKNOWNdef max_error(str): words = str.split(' ') words = filter(lambda x:x != '', words) lis = [int(words[1]), int(words[2]), int(words[3]), int(words[4]), int(words[7])] return max(lis)def check_lsi(): (status, output) = commands.getstatusoutput('%s' % (LSPCI)) if status != 0: print 'UNKNOWN|LSPCI encounter a problem' return STATUS_UNKNOWN sys.exit(1) else: if(output.find('LSI') >=0 ): return STATUS_OK else: print 'ERROR|There is no lspci raid' return STATUS_ERRORdef check_MegaCli(): check_lsi() device_id = get_device_id() error_count = get_error_count() # Some judgement, maybe useless if len(device_id)<1 or len(error_count)<1: print 'ERROR|There is some error because one of the device_id and error_count is 0' return STATUS_ERROR elif len(device_id)*2 != len(error_count): print 'ERROR|There is some error because the num of error_count does not equal to double device_id' return STATUS_ERROR else: warn_num = [1,2,3,4] # 0 represent NORMAL.1---WARNING.2---CRITICAL status_num = 0; if max(error_count) == 0: print 'OK' return STATUS_OK elif max(error_count) >=5: print 'ERROR|There is ' + str(max(error_count)) + ' error in device ' + error_count.index(max(error_count)) return STATUS_ERROR else: print 'ERROR|There is ' + str(max(error_count)) + ' error in device ' + error_count.index(max(error_count)) return STATUS_WARNING # Just for testing, print the error and the device_id # if status_num == 0: # i = 0 # while i < len(device_id): # print 'Device_Id ' + str(device_id[i]) + ':' # print 'Media Error Count :' + str(error_count[2*i]) # print 'Other Error Count :' + str(error_count[2*i+1]) # i = i + 1 # return status_numdef get_device_id(): (status, output) = commands.getstatusoutput('%s %s %s' % (MEGACLI, PDLIST, DEVICE)) if status != 0: print 'ERROR|Error for get device id' return STATUS_ERROR sys.exit(1) else: device_id = [] line = output.split('\n') for item in line: device_id.append(int(item.split(' ')[-1])) return device_iddef get_error_count(): (status, output) = commands.getstatusoutput('%s %s %s' % (MEGACLI, PDLIST, ERROR)) if status != 0: print 'Error|Error for get MegaCli error count' return STATUS_ERROR sys.exit(1) else: error_count = [] line = output.split('\n') for item in line: error_count.append(int(item.split(' ')[-1])) return error_countdef check_ssd(device_id,disk): (status, output) = commands.getstatusoutput('%s %s%s %s %s' % (SMARTCTL, '-a -d sat+megaraid,', device_id,disk, '|grep Media_Wearout_Indicator')) if status != 0: print 'UNKNOWN|Something unexpected happened,now is doing check_ssd().' return STATUS_UNKNOWN sys.exit(1) else: life = int(str(output).split(' ')[5]) if life >= 50: print 'OK|The life of the SSD is ' + str(life) +'% left' return STATUS_OK elif life < 50 and life >= 20: print 'WARNING|The life of the SSD is ' + str(life) + '% < 20%' return STATUS_WARNING else: print 'CRITICAL|The life of the SSD is ' + str(life) + '% < 10%' return STATUS_ERRORdef check_ssd_no_id(disk): (status, output) = commands.getstatusoutput('%s %s %s %s' % (SMARTCTL, '-a ', disk, '|grep Media_Wearout_Indicator')) if status != 0: print 'UNKNOWN|Something unexpected happened,now is doing check_ssd().' return STATUS_UNKNOWN sys.exit(1) else: life = int(str(output).split(' ')[5]) if life >= 50: print 'OK|The life of the SSD is ' + str(life) +'% left' return STATUS_OK elif life < 50 and life >= 20: print 'WARNING|The life of the SSD is ' + str(life) + '% < 20%' return STATUS_WARNING else: print 'CRITICAL|The life of the SSD is ' + str(life) + '% < 10%' return STATUS_ERRORdef init_option(): parser = argparse.ArgumentParser(description="DISK nagios plugin.") parser.add_argument('-r', '--raid', help='raid or not(y/n)') parser.add_argument('-s', '--ssd', help='ssd or not(y/n), need device_id(0,1,2) and disk(/dev/sdc)') parser.add_argument('-i', '--device', help='Device Id(0,1,2), which is needed in check_ssd') parser.add_argument('-d', '--disk', help='DISK(/dev/sdx),which is needed in check_ssd') return parserdef main(): parser = init_option() args = parser.parse_args() if args.raid == 'y': if not args.ssd: return check_MegaCli() else: if not args.device or not args.disk: print 'Error|Check ssd needs device id and disk' return STATUS_ERROR sys.exit(1) else: # If it doesn't in the list of device id device_id = get_device_id() if int(args.device) in device_id: return check_ssd(args.device,args.disk) else: print 'Error|You must specify a Device_Id ' + str(args.device) return STATUS_ERROR sys.exit(1) else: if not args.ssd: return check_smartctl() elif args.ssd == 'y': # For the ssd doesn't need device id(no MegaCli) if not args.disk: print 'Error|Check the life of SSD with no ID must assign the DISK(/dev/sdx)' return STATUS_ERROR sys.exit(1) else: return check_ssd_no_id(args.disk)if __name__ == '__main__': sys.exit(main())# usage: check_disk_health_v2.py [-h] [-r RAID] [-s SSD] [-i DEVICE] [-d DISK]# 要监控一台电脑的磁盘,因为不带自动识别,所以对于每一台电脑,都需要指定其:# 是否有RAID:# 是:是否检测SSD# 是:check_ssd()# 否:check_megacli()# 否:是否检测SSD# 是:check_ssd_no_id()# 否:check_smartctl()## 都需要自行指定参数,有点小麻烦
- Nagios自己编写监控磁盘脚本check_disk
- 用 Python 编写一个nagios监控磁盘负载的插件
- 磁盘监控报警-nagios
- nagios环境搭建与监控mysql,脚本编写
- 自己写了个nagios 监控squid 的脚本。。
- 自己写的监控磁盘空间的nagios插件脚本
- nagios 监控脚本
- nagios 监控hadoop脚本
- nagios 监控页面脚本
- nagios io监控脚本
- nagios 执行check_disk时报错“NRPE: Command 'check_disk' not defined
- 懒人nagios页面监控脚本
- 关于nagios系统下使用shell脚本自定义监控插件的编写
- nagios 插件编写:检查磁盘状况
- Nagios监控mfs运行状态插件脚本
- nagios监控服务器磁盘空间的脚本
- nagios监控cpu使用率的脚本代码
- nagios系统监控IO异常脚本
- 规则引擎 visualrules 开发基础教程【连载2】-- 安装篇
- linux tcp udp 原理图
- fileURLWithPath:和URLWithString:的区别
- 一些练习后的笔记
- 在eclipse中配置maven项目
- Nagios自己编写监控磁盘脚本check_disk
- Item 9:在析构/构造时不要调用虚函数 Effective C++笔记
- 原码, 反码, 补码 详解
- matlab_数据拟合
- left join
- 关于端口占用问题,利用tomcat发布工程,有时会出现由于端口被占用而发布不成功,就需要终止端口
- poj 3648 wedding(2-sat 拓扑排序输出方案)
- css知多少(4)——解读浏览器默认样式
- Android App启动出现预加载界面