尚观第15天nagios安装配置

来源:互联网 发布:怎么查看淘宝客的佣金 编辑:程序博客网 时间:2024/04/23 14:30
添加帐户和组:
useradd nagios
groupadd nagcmd
usermod -G nagcmd nagios
usermod -G nagcmd apache
nagios安装:
tar xvzf nagios-3.2.0.tar.gz -C /usr/src/
cd /usr/src/nagios-3.2.0/
./configure --with-nagios-group=nagcmd
make all
make install
make install-init
make install-commandmode
make install-config
make install-webconf
设置nagios web接口:
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
/etc/init.d/httpd restart
nagios-plugins安装:
tar xvzf nagios-plugins-1.4.13.tar.gz -C /usr/src/
cd /usr/src/nagios-plugins-1.4.13/
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make && make install
添加开机启动项:
chkconfig --add nagios
chkconfig nagios on
验证配置文件:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
启动服务:
/etc/init.d/nagios start
监控linux主机:
NRPE的安装配置
NRPE是nagios的一个扩展,它被用于被监控的服务器上,向nagios监控平台提
供该服务器的一些本地的情况。例如,cpu负载、内存使用、硬盘使用等等。
1)需要在nagios监控平台服务器上安装NRPE
tar xzf nrpe-2.8.1.tar.gz
cd nrpe-2.8.1
./configure
make all
make install-plugin
如果安装成功,就可以在/.../nagios/libexec 目录中找到 "check_nrpe"这个插件。
之后需要定义一个可以在监控平台使用的命令, 这个定义一般会
在/.../nagios/etc/commands.cfg中,其内容如下:
在commands.cfg加入:
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
这样就可以在定义服务的时候使用这个命令了,举个例子:
define service{
host_name remotehost
service_description CPU Load
...
check_command check_nrpe!check_load }
这个例子就定义了对remotehost服务器cpu负载情况的监控。
2)如何在被监控服务器上安装NRPE。
首先,需要准备的包有两个:NRPE和nagios-plugin。首先安装插件:
/usr/sbin/useradd nagios
passwd nagios
tar xzf nagios-plugins-1.4.9.tar.gz
cd nagios-plugins-1.4.9
./configure --prefix=/usr/local/nagios
make && make install
chown nagios.nagios /usr/local/nagios/
chown -R nagios.nagios /usr/local/nagios/libexec/
然后安装NRPE:
tar xzf nrpe-2.8.1.tar.gz
cd nrpe-2.8.1
./configure
make all
make install-plugin
make install-daemon
make install-daemon-config
安装好了,可以到/usr/local/nagios/下面检查一下,应该生成了4 个目
录:bin、etc、libexec、share。之后我们要配置 一下,目的是让NRPE可以以守
护进程的形式监听5666端口,为特定地址的nagios平台提供服务。
首先,需要修改/usr/local/nagios/etc/nrpe.cof。
找到“allowed_hosts=127.0.0.1”将其改为:
allowed_hosts=127.0.0.1,$Nagios监控平台的地址或域名
3)启动NRPE守护进程:(可以将此命令加入/etc/rc.local,以便开机自动启动)
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
此命令生成的日志会在系统的日志(/var/log/message)中。如果没有出错,就
基本搞定了。我们来验收一下,在本机上:
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
或者在nagios监控平台服务器上:
/usr/local/nagios/libexec/check_nrpe -H $目标主机地址
正常的返回值为被监控服务器上安装的NRPE的版本信息:
NRPE v2.8.1
如果看到这些,恭喜你,你的NRPE安装成功了。
那么,通过NRPE,可以监控到哪些信息呢? 只要在被监控服务器上有的插件
(/usr/local/nagios/libexec中的所有插件),都可以使用。也就是说,你想监控
什么,只要有对应的插件,就可以实现。
在被监控端的 nrpe.cfg 文件中,可以看到这样的配置:
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5
-c 30,25,20
这是用来检查 CPU 负载的。
这样,就可以在监控平台上定义如下服务来监控被监控端的 CPU 负载了:
define service{
host_name remotehost
service_description check_load
...
check_command check_nrpe!check_load }
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~
vi nagios.cfg
35 # Definitions for monitoring the local (Linux) host
36 cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
37 cfg_file=/usr/local/nagios/etc/objects/253.cfg
38 cfg_file=/usr/local/nagios/etc/objects/254.cfg
vi objects/254.cfg
###########################################
# LOCALHOST.CFG - SAMPLE OBJECT CONFIG FILE FOR MONITORING
THIS MACHINE
#
# Last Modified: 05-31-2007
#
# NOTE: This config file is intended to serve as an *extremely* simple
# example of how you can create configuration entries to monitor
# the local (Linux) machine.
#
###########################################
#
# HOST DEFINITION
#
###########################################
# Define a host for the local machine
define host{
use linux-server ; Name of host template to use
; This host definition will inherit all
variables that are defined
; in (or inherited by) the linux-server
host template definition.
host_name mail
alias mail
address 192.168.1.254
}
###########################################
#
# HOST GROUP DEFINITION
#
##########################################
# Define an optional hostgroup for Linux machines
define hostgroup{
hostgroup_name mailgroup ; The name of the hostgroup
alias mailgroup ; Long name of the group
members mail ; Comma separated list of hosts that belong
to this group
}
###########################################
#
# SERVICE DEFINITIONS
#
###########################################
# Define a service to "ping" the local machine
define service{
use local-service ; Name of service template
to use
host_name mail
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
# Define a service to check the disk space of the root partition
# on the local machine. Warning if < 20% free, critical if
# < 10% free space on partition.
define service{
use local-service ; Name of service template
to use
host_name mail
service_description Root Partition
check_command check_nrpe!check_sda2
}
# Define a service to check the number of currently logged in
# users on the local machine. Warning if > 20 users, critical
# if > 50 users.
define service{
use local-service ; Name of service template
to use
host_name mail
service_description Current Users
check_command check_nrpe!check_users
}
# Define a service to check the number of currently running procs
# on the local machine. Warning if > 250 processes, critical if
# > 400 users.
define service{
use local-service ; Name of service template
to use
host_name mail
service_description Total Processes
check_command check_nrpe!check_total_procs
}
# Define a service to check the load on the local machine.
define service{
use local-service ; Name of service template
to use
host_name mail
service_description Current Load
check_command check_nrpe!check_load
}
# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is
free
#define service{
# use local-service ; Name of service
template to use
# host_name localhost
# service_description Swap Usage
# check_command check_local_swap!20!10
# }
# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may
have SSH enabled.
define service{
use local-service ; Name of service template
to use
host_name mail
service_description SSH
check_command check_ssh
notifications_enabled 0
}
# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may
have HTTP enabled.
define service{
use local-service ; Name of service template
to use
host_name mail
service_description HTTP
check_command check_http
notifications_enabled 0
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~``
vi objects/253.cfg
###########################################
# LOCALHOST.CFG - SAMPLE OBJECT CONFIG FILE FOR MONITORING
THIS MACHINE
#
# Last Modified: 05-31-2007
#
# NOTE: This config file is intended to serve as an *extremely* simple
# example of how you can create configuration entries to monitor
# the local (Linux) machine.
#
###########################################
#
# HOST DEFINITION
#
###########################################
# Define a host for the local machine
define host{
use linux-server ; Name of host template to use
; This host definition will inherit all
variables that are defined
; in (or inherited by) the linux-server
host template definition.
host_name www
alias www
address 192.168.1.253
}
#####################################
#
# HOST GROUP DEFINITION
#
###########################################
# Define an optional hostgroup for Linux machines
define hostgroup{
hostgroup_name linux-servers1 ; The name of the hostgroup
alias Linux Servers1 ; Long name of the group
members www ; Comma separated list of hosts that belong
to this group
}
###########################################
#
# SERVICE DEFINITIONS
#
###########################################
# Define a service to "ping" the local machine
define service{
use local-service ; Name of service template
to use
host_name www
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
# Define a service to check the disk space of the root partition
# on the local machine. Warning if < 20% free, critical if
# < 10% free space on partition.
define service{
use local-service ; Name of service template
to use
host_name www
service_description Root Partition
check_command check_local_disk!20%!80%!/
}
# Define a service to check the number of currently logged in
# users on the local machine. Warning if > 20 users, critical
# if > 50 users.
define service{
use local-service ; Name of service template
to use
host_name www
service_description Current Users
check_command check_local_users!2!5
}
# Define a service to check the number of currently running procs
# on the local machine. Warning if > 250 processes, critical if
# > 400 users.
define service{
use local-service ; Name of service template
to use
host_name www
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}
# Define a service to check the load on the local machine.
define service{
use local-service ; Name of service template
to use
host_name www
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!
10.0,6.0,4.0
}
# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is
free
define service{
use local-service ; Name of service template
to use
host_name www
service_description Swap Usage
check_command check_local_swap!20!10
}
# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may
have SSH enabled.
define service{
use local-service ; Name of service template
to use
host_name www
service_description SSH
check_command check_ssh
notifications_enabled 0
}
# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may
have HTTP enabled.
define service{
use local-service ; Name of service template
to use
host_name www
service_description HTTP
check_command check_http
notifications_enabled 0
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
192.168.1.254上的nrpc.conf文件:
79 allowed_hosts=127.0.0.1,192.168.1.253
command[check_users]=/usr/local/nagios/libexec/check_users -w 3 -c 4
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5
-c 30,25,20
command[check_sda2]=/usr/local/nagios/libexec/check_disk -w 20% -c
60% -p /dev/sda2
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs
-w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w
80 -c 100
监控windows主机:
vi /usr/local/nagios/etc/nagios.cfg
把下面这行最前面的#号去掉:
#cfg_file=/usr/local/nagios/etc/objects/windows.cfg
监控windows:
被监控机器-Windows端设置:
1.从http://sourceforge.net/projects/nscplus 站点下载最新稳定版的NSClient++软件包;
2.展开软件包到一个目录下,如C:\NSClient++;
3.打开一个命令行窗口并切换到C:\NSClient++目录下;
4.用下面命令将NSClient++系统服务注册到系统里:
nsclient++ /install
5.用下面命令安装NSClient++系统托盘程序('SysTray'是大小写敏感的):
nsclient++ SysTray
6.打开服务管理器并确认NSClientpp服务可以在桌面交互(看一下服务管理器里的'Log On'选项
页),如果没有允许桌面交互,点一下里面的选择项打开它。
7.编辑NSC.INI文件(位于C:\NSClient++目录)并做如下修改:
1. 去掉在[modules]段里的列出模块程序的注释,除了CheckWMI.dll和RemoteConfiguration.dll;
2. 最好是修改一下在[Settings]段里的'password'选项;
3. 去掉在[Settings]段里的'allowed_hosts'选项注释,把Nagios服务所在主机的IP加到这一行里,或
是置为空,让全部主机都可以联入;
4. 确认一下在[NSClient]段里的'port'选项里已经去掉注释并设置成'12489'(默认端口);
8.用下面命令启动NSClient++服务:
nsclient++ /start
nagios监控主机设置:
vi /usr/local/nagios/etc/objects/windows.cfg
给Windows机器加一个新的主机对象定义以便监控。如果是被监控的第一台
Windows机器,可以只是修改windows.cfg文件里的对象定义。修改
host_name、alias和address域以符合那台Windows机器。
define host{
Use windows-server; Inherit default values from a Windows server
template (make sure you keep this line!)
host_namewinserver
Alias My Windows Server
Address 192.168.1.2
}
好了。下面可以加几个服务定义(在同一个配置文件里)以使Nagios监控Windows
机器上的不同属性内容。如果是第一台Windows机器,可以只是修改
windows.cfg里的服务对象定义。
注意
用你刚刚加好的主机对象定义里的host_name来替换例子里
的"winserver"。
加入下面的服务定义以监控运行于Windows机器上的NSClient++外部构件的版
本。当到时间要升级Windows机器上的外部构件时这信息会很用有,因为它可以
告知这台Windows机器上的NSClient++需要升级到最新版本。
define service{
use generic-service
host_name winserver
service_description NSClient++ Version
check_command check_nt!CLIENTVERSION
}
加入下面的服务定义以监控Windows机器的启动后运行时间。
define service{
Use generic-service
Host_name winserver
Service_description Uptime
Check_command check_nt!UPTIME
}
加入下面的服务定义可监控Windows机器的CPU利用率,并在5分钟CPU负荷
高于90%时给出一个紧急警报或是高于80%时给出一个告警警报。
define service{
use generic-service
host_name winserver
service_description CPU Load
check_command check_nt!CPULOAD!-l 5,80,90
}
加入下面的服务定义可监控Windows机器的内存占用率,并在5分钟内存占用率
高于90%时给出一个紧急警报或是高于80%时给出一个告警警报。
define service{
Use generic-service
host_namewinserver
Service_description Memory Usage
Check_command check_nt!MEMUSE!-w 80 -c 90
}
加入下面的服务定义可监控Windows机器的C:盘的磁盘利用率,并在磁盘利用率
高于90%时给出一个紧急警报或是高于80%时给出一个告警警报。
define service{
use generic-service
host_name winserver
service_description C:\ Drive Space
check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90
}
加入下面的服务定义可监控Windows机器上的W3SVC服务状态,并在W3SVC
服务停止时给出一个紧急警报。
define service{
Use generic-service
host_namewinserver
Service_description W3SVC
Check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC }
加入下面的服务定义可监控Windows机器上的Explorer.exe进程,并在进程没有
运行时给出一个紧急警报。
define service{
use generic-service
host_name winserver
service_description Explorer
check_command check_nt!PROCSTATE!-d SHOWALL -l
Explorer.exe
}