nagios监控与报警时间间隔

来源:互联网 发布:mac 找不到原身 编辑:程序博客网 时间:2024/05/17 02:22

转自:http://blog.sina.com.cn/slaysly



nagios监控与报警时间间隔:

max_check_attempts:   This directive is used to define the number of times that Nagios will retry the service check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the service check again.
check_interval:    This directive is used to define the number of "time units" to wait before scheduling the next "regular" check of the service. "Regular" checks are those that occur when the service is in an OK state or when the service is in a non-OK state, but has already been rechecked max_check_attempts number of times. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.
retry_interval:       This directive is used to define the number of "time units" to wait before scheduling a re-check of the service. Services are rescheduled at the retry interval when they have changed to a non-OK state. Once the service has been retriedmax_check_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by thecheck_interval value. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.
notification_interval: This directive is used to define the number of "time units" to wait before re-notifying a contact that this service is still in a non-OK state. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. If you set this value to 0, Nagios will not re-notify contacts about problems for this service - only one problem notification will be sent out.

在OK状态,nagios用check_interval定义的时间间隔来监控,出现问题后,切换为retry_interval和max_check_attempts进行监控,达到max_check_attempts后触发首次报警,同时恢复为check_interval进行监控,并用notification_interval定义的时间间隔来发送报警,服务恢复后,在最近的check_interval点发送OK短信,完成报警周期。

特殊:
1.max_check_attempts定义为1,检测到问题后立即报警,不重试。
2.notification_interval定义为0,报警只发送一次,不重发。
0 0