[Nagios] Linux/Win 端监控

来源:互联网 发布:sql语法 编辑:程序博客网 时间:2024/05/16 17:03

一、系统环境

 

IP

角色

系统环境

防火墙

Selinux

软件版本

端口

192.168.2.130

Nagios Server

Rhel7.2 X86-64

关闭

关闭

Nagios 4.3.1 | Nagios-plugins2.2.0 | Nrpe3.0.1

5666

192.168.2.130

Nagios Client

Rhel7.2 X86-64

关闭

关闭

Nagios-plugins2.2.0 | Nrpe3.0.1

5666


二、添加 linux 端监控 

# 客户端操作:

1、创建用户

#useradd -s /sbin/nologin nagios

 

2、安装 nagios-plugins

略,参照 Server 安装 篇

 

3、安装nrpe

#mkdir /usr/local/nagios

#chown nagios.nagios-R/usr/local/nagios

#yum -y install openssl-devel

 

#tar -zxvf nrpe-3.0.1.tar.gz

# cd nrpe-3.0.1

#./configure --with-nrpe-user=nagios --with-nrpe-group=nagios--with-nagios-user=nagios --with-nagios-group=nagios --enable-command-args--enable-ssl

#make all

#make install-plugin

#make install-daemon

#make install-config

#makeinstall-daemon-config    # 3版本以下使用 

#make  install-xinetd                   # 3版本以下使用

 

4、配置nrpe.cfg

#mkdir /usr/local/nagios/etc

#cp -a /usr/local/src/nrpe-3.0.1/sample-config/nrpe.cfg  /usr/local/nagios/etc/

#chown nagios.nagios -R /usr/local/nagios

# vi nrpe.cfg

log_facility=daemon

debug=0

pid_file=/usr/local/nagios/var/nrpe.pid

server_port=5666

server_address=192.168.1.202

nrpe_user=nagios

nrpe_group=nagios

allowed_hosts=127.0.0.1,192.168.1.201

dont_blame_nrpe=0

allow_bash_command_substitution=0

command_timeout=60

connection_timeout=300

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

command[check_disk]=/usr/local/nagios/libexec/check_disk -w 25% -c 15%

command[check_mem]=/usr/local/nagios/libexec/check_mem.sh -w 15 -c 10

command[check_cpu]=/usr/local/nagios/libexec/check_cpu.sh -w 80% -c 90%

command[check_uptime]=/usr/local/nagios/libexec/check_uptime.sh

command[check_swap]=/usr/local/nagios/libexec/check_swap -w 85% -c 80%

command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z

command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 250 -c 300


5、复制监控脚本至libexec目录下

# ll /usr/local/nagios/libexec/check_*.sh

-rwxr-xr-x 1 nagios nagios 8196 Mar 27 08:58 /usr/local/nagios/libexec/check_cpu.sh

-rwxr-xr-x 1 nagios nagios 2789 Mar 27 08:58 /usr/local/nagios/libexec/check_mem.sh

-rwxr-xr-x 1 nagios nagios  791 Mar 27 08:58 /usr/local/nagios/libexec/check_uptime.sh


 # 脚本下载地址:点击打开链接

 

6、生成启动nrpe脚本

#vi /etc/init.d/nrpe

#!/bin/bash

# chkconfig: 2345 88 12

# description: NRPE DAEMON

NRPE=/usr/local/nagios/bin/nrpe

NRPECONF=/usr/local/nagios/etc/nrpe.cfg

case "$1" in

      start)

              echo -n "Starting NRPE daemon..."

              $NRPE -c $NRPECONF -d

              echo " done."

              ;;

      stop)

              echo -n "Stopping NRPE daemon..."

              pkill -u nagios nrpe

              echo " done."

      ;;

      restart)

              $0 stop

              sleep 2

              $0 start

              ;;

      *)

              echo "Usage: $0 start|stop|restart"

              ;;

      esac

exit 0

 

# chmod +x /etc/init.d/nrpe

 

 

7、启动

# chkconfig nrpe on

# /etc/init.d/nrpe start

# ps -ef| grep nrpe

/usr/local/nagios/bin/nrpe-c /usr/local/nagios/etc/nrpe.cfg -d


8、验证

# netstat-tunlp

 

tcp        0      0 192.168.1.202:5666      0.0.0.0:*               LISTEN      26921/nrpe 



Server 端操作:

1、修改主机配置文件

# cd/usr/local/nagios/etc/objects

# mv localhost.cfg linux.cfg

# vi linux.cfg             #定义主机并定义服务,这里定义的服务名必须在commands.cfg文件中存在

###############################################################################

# LOCALHOST.CFG - SAMPLE OBJECT CONFIG FILE FOR MONITORING THIS MACHINE

#

#

# NOTE: This config file is intended to serve as an *extremely* simple

#       example of how you can create configuration entries to monitor

#       the local (Linux) machine.

#

###############################################################################

 

 

 

 

###############################################################################

###############################################################################

#

# HOST DEFINITION

#

###############################################################################

###############################################################################

 

# Define a host for the local machine

 

define host{

        use                     linux-server            ; Name of host template to use

        host_name               zabbix_server

        alias                   zabbix_server

        address                 192.168.1.201

        contact_groups          +admins

        }

 

 

define host{

        use                     linux-server            ; Name of host template to use

        host_name               zabbix_proxy

        alias                   zabbix_proxy

        address                 192.168.1.202

        contact_groups          +admins

        }

 

###############################################################################

###############################################################################

#

# HOST GROUP DEFINITION

#

###############################################################################

###############################################################################

 

# Define an optional hostgroup for Linux machines

 

define hostgroup{

        hostgroup_name  linux-servers ; The name of the hostgroup

        alias           Linux Servers ; Long name of the group

        members         zabbix_server,zabbix_proxy ; Comma separated list of hosts that belong to this group

        }

 

 

 

###############################################################################

###############################################################################

#

# SERVICE DEFINITIONS

#

###############################################################################

###############################################################################

 

 

# Define a service to "ping" the local machine

 

define service{

        use                             generic-service         ; Name of service template to use

        hostgroup_name                  linux-servers ;                       localhost

        service_description             PING

        check_command                   check_ping!100.0,20%!500.0,60%

        }

 

 

# Define a service to check the disk space of the root partition

# on the local machine.  Warning if < 20% free, critical if

# < 10% free space on partition.

 

define service{

        use                             generic-service         ; Name of service template to use

        hostgroup_name                  linux-servers

        service_description             Root Partition

        check_command                   check_local_disk!20%!10%!/

        }

 

 

 

# Define a service to check the number of currently logged in

# users on the local machine.  Warning if > 20 users, critical

# if > 50 users.

 

define service{

        use                             generic-service         ; Name of service template to use

        hostgroup_name                  linux-servers

        service_description             Current Users

        check_command                   check_local_users!20!50

        }

 

 

# Define a service to check the number of currently running procs

# on the local machine.  Warning if > 250 processes, critical if

# > 400 processes.

 

define service{

        use                             generic-service         ; Name of service template to use

        hostgroup_name                  linux-servers

        service_description             Total Processes

        check_command                   check_local_procs!250!400!RSZDT

        }

 

 

 

# Define a service to check the load on the local machine.

 

define service{

        use                             generic-service         ; Name of service template to use

        hostgroup_name                  linux-servers

        service_description             Current Load

        check_command                   check_local_load!5.0,4.0,3.0!10.0,6.0,4.0

        }

 

 

 

# Define a service to check the swap usage the local machine.

# Critical if less than 10% of swap is free, warning if less than 20% is free

 

define service{

        use                             generic-service         ; Name of service template to use

        hostgroup_name                  linux-servers

        service_description             Swap Usage

        check_command                   check_local_swap!20!10

        }

 

 

 

# Define a service to check SSH on the local machine.

# Disable notifications for this service by default, as not all users may have SSH enabled.

 

define service{

        use                             generic-service         ; Name of service template to use

        hostgroup_name                  linux-servers

        service_description             SSH

        check_command                   check_ssh!

        notifications_enabled           0

        }

 

 

 

# Define a service to check HTTP on the local machine.

# Disable notifications for this service by default, as not all users may have HTTP enabled.

 

define service{

        use                             generic-service         ; Name of service template to use

        host_name                       zabbix_server

        service_description             HTTP

        check_command                   check_http!

        notifications_enabled           0

        }

 

define service {

        use                             generic-service

        hostgroup_name                  linux-servers

        service_description             Check CPU

        check_command                   check_nrpe!check_cpu       

        notifications_enabled           0

        }

 

define service {

        use                             generic-service

        hostgroup_name                  linux-servers

        service_description             Check Disk

        check_command                   check_nrpe!check_disk

        notifications_enabled           0

        }

 

 

define service {

        use                             generic-service

        hostgroup_name                  linux-servers

        service_description             Check MEM

        check_command                   check_nrpe!check_mem

        notifications_enabled           0

        }

 

define service {

        use                             generic-service

        hostgroup_name                  linux-servers

        service_description             Check uptime

        check_command                   check_nrpe!check_uptime

        notifications_enabled           0

        }

 

define service {

        use                             generic-service

        hostgroup_name                  linux-servers

        service_description             Check uptime

        check_command                   check_nrpe!check_uptime

        notifications_enabled           0

        }

 

 

2、定义服务

#vicommands.cfg

define command {

                command_name                          check_local_cpu

                command_line                          $USER1$/check_cpu.sh -w $ARG1$ -c $ARG2$

}

 

define command {

                command_name                          check_local_mem

                command_line                          $USER1$/check_local_mem.sh -w $ARG1$ -c $ARG2$

}

 

define command {

                command_name                          check_nrpe

                command_line                          $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

}

 

 

define command {

                command_name                          check_local_uptime

                command_line                          $USER1$/check_uptime.sh

}

 

3、定义总配置文件

#vi ../nagios.cfg

cfg_file=/usr/local/nagios/etc/objects/linux.cfg             #localhost.cfg修改为linux.cfg

 

4、验证配置文件是否有错误

#/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.3.1

Copyright (c) 2009-present Nagios Core Development Team and Community Contributors

Copyright (c) 1999-2009 Ethan Galstad

Last Modified: 02-23-2017

License: GPL

 

Website: https://www.nagios.org

Reading configuration data...

   Read main config file okay...

Warning: Duplicate definition found for service 'Check uptime' on host 'zabbix_proxy' (config file '/usr/local/nagios/etc/objects/linux.cfg', starting on line 188)

Warning: Duplicate definition found for service 'Check uptime' on host 'zabbix_server' (config file '/usr/local/nagios/etc/objects/linux.cfg', starting on line 196)

   Read object config files okay...

 

Running pre-flight check on configuration data...

 

Checking objects...

        Checked 30 services.

        Checked 3 hosts.

        Checked 2 host groups.

        Checked 0 service groups.

        Checked 1 contacts.

        Checked 1 contact groups.

        Checked 28 commands.

        Checked 5 time periods.

        Checked 0 host escalations.

        Checked 0 service escalations.

Checking for circular paths...

        Checked 3 hosts

        Checked 0 service dependencies

        Checked 0 host dependencies

        Checked 5 timeperiods

Checking global event handlers...

Checking obsessive compulsive processor commands...

Checking misc settings...

 

Total Warnings: 0

Total Errors:   0

 

5、验证客户端与服务端连通性

#cd /usr/local/nagios/libexec/

#./check_nrpe -H 192.168.1.202

NRPE v3.0.1      #出现该字段则说明正常

 

6、启动

# servicenagios restart



三、添加Win端监控

1、客户端配置

安装NSClient++NSClient++32位版和64位版

解压NSClient0.3.8-Win32C盘根目录

打开windows命令行,切换到NSClient0.3.8-Win32目录

执行NSClient++ /install

执行NSClient++ SysTray(注意大小写有区别)

安装完后打打开windows服务设置,如下图,勾选“允许服务与桌面交互”



修改NSClient0.3.8-Win32下的nsc.ini文件

[modules]选项

所有模块前面的注释都去掉,除了CheckWMI.dll and RemoteConfiguration.dll这两个;

[Settings]选项

allowed_hosts选项的注释去掉,并且加上运行nagios的监控主机的IP.我改为如下:allowed_hosts=127.0.0.1/32,172.16.99.245

[NSClient]选项

port选项去掉注释,并且它的值是'12489',这是NSClient的默认监听端口;

在命令行中执行NSClient++ /start启动服务

windows主机如有防火墙,请开放相应端口


2、服务端配置

# vinagios.cfg

cfg_file=/usr/local/nagios/etc/objects/windows.cfg      #将前面的注释删掉

# vi/usr/local/nagios/etc/objects/windows.cfg     #按要求配置,参照linux

 

3、重启服务

# servicenagios restart