Install Monitoring Systems

来源:互联网 发布:js有几种数据类型 编辑:程序博客网 时间:2024/06/05 04:40

Install Monitoring Systems

  • Ganglia
    • Compiling Ganglia
    • Make tarball to install on clients
    • Install Ganglia Client
      • Install Infiniband (optional)
      • Install Disk Metrics (optional)
    • Installing Ganglia Server
  • Nagios
    • Installing a Client
    • Nagios Front End(website) Administration
    • Acknowledge Nagios Alerts
    • Administrating our Nagios Server
      • Adding a host definition
      • Adding monitoring for new service via nagios
      • Restarting/Reloading nagios definitions
  • Resources
We install ganglia in /opt/services instead of the normal location. This seperates it from the OS install and allows us to re-install and/or upgrade the OS without worry of interfearing with installed services like ganglia. We also keep /opt/services on a seperate disk which allows us to replace the entire OS disk without interfearing with installed services. Finally, we try to seperate the local configurations of a service (ganglia-local) from the service itsetlf (ganglia). This allow us to easily upgrade the service without having to reconfigure it to work the way the previous version did.


Ganglia


Compiling Ganglia

Make an area for ganglia to live
mkdir -p /opt/services/ganglia-3.2.0

Ganglia requires libconfuse. Download libconfuse from http://www.nongnu.org/confuse/
cd /tmptar xfvz /home/src/ganglia/src/confuse-2.7.tar.gzcd confuse-2.7configure --prefix=/opt/services/ganglia-3.2.0 --enable-sharedmakemake install  

Ganglia also requires rrdtool for head nodes. Download rrdtool from http://www.rrdtool.org/
cd /tmptar xfvz /home/src/ganglia/src/rrdtool-1.4.6.tar.gzcd rrdtool-1.4.6configure --prefix=/opt/services/ganglia-3.2.0 --enable-sharedmakemake install  

Download ganglia from http://ganglia.sourceforge.net/
cd /tmptar xfvz /home/src/ganglia/src/ganglia-3.2.0.tar.gz cd ganglia-3.2.0LDFLAGS="-L/opt/services/ganglia-3.2.0/lib" configure --prefix=/opt/services/ganglia-3.2.0 --with-libconfuse=/opt/services/ganglia-3.2.0 --with-gmetadmakemake install(cd /opt/services ; ln -s ganglia-3.2.0 ganglia)mkdir -p /opt/services/ganglia-local/binmkdir -p /opt/services/ganglia-local/etc/conf.dmkdir -p /opt/services/ganglia-local/init.dmkdir -p /opt/services/ganglia-local/lib64/ganglia/python_modules
If 32bit make lib instead of lib64

Create client configure script
cp gmond/gmond.conf /opt/services/ganglia-local/etccp /opt/services/ganglia/etc/conf.d/modpython.conf /opt/services/ganglia-local/etc/conf.d
edit /opt/services/ganglia-local/etc/gmond.conf and at least set the name in the cluster block. You may also want to change the ports used in the three channel blocks.

Create client startup script
cp gmond/gmond.init /opt/services/ganglia-local/init.d/nrao-gmond
edit /opt/services/ganglia-local/init.d/nrao-gmond

Create server configure script
cp gmetad/gmetad.conf /opt/services/ganglia-local/etc
edit /opt/services/ganglia-local/etc/gmetad.conf and at least set the data_source to the name you set in gmond.conf
e.g. data_source "Cluster" node1.example.edu:8649 
edit /opt/services/ganglia-local/etc/conf.d/modpython.conf and change params and include to reference ganglia-local.
e.g. params = "/opt/services/ganglia-local/lib64/ganglia/python_modules" 
e.g. include('/opt/services/ganglia-local/etc/conf.d/*.pyconf') 

Create server startup script
cp gmetad/gmetad.init /opt/services/ganglia-local/init.d/nrao-gmetad
edit /opt/services/ganglia-local/init.d/nrao-gmetad


Make tarball to install on clients

cd /opt/servicestar cfvz ganglia_nrao_`uname -i`-3.2.0.tgz ganglia*
copy ganglia_nrao_`uname -i`-3.2.0.tgz /home/src/ganglia


Install Ganglia Client

cd /opt/services ; tar xfvz /home/src/ganglia/ganglia_nrao_`arch -i`-3.2.0.tgzln -s /opt/services/ganglia-local/init.d/nrao-gmond /etc/init.dchkconfig --add nrao-gmond/etc/init.d/nrao-gmond start


Install Infiniband (optional)

Download the InfiniB and network performance script from http://ganglia.info/gmetric/ 
Save it as /opt/services/ganglia-local/bin/infin.py and create a startup script to run it.

Edit /opt/services/ganglia-local/bin/infin.py
GMETRIC = '/opt/services/ganglia/bin/gmetric'GMOND_CONF="/opt/services/ganglia-local/etc/gmond.conf"
Because we install ganglia in a non-standard location we had to edit infin.py to include the GMOND_CONF. I will attach our version to this page.
ln -s /opt/services/ganglia-local/init.d/nrao-infin /etc/init.dchkconfig --add nrao-infin/etc/init.d/nrao-infin start


Install Disk Metrics (optional)

Dwonload diskstats.py from https://github.com/ganglia/gmond_python_modules/pull/1/files 
Save it as /opt/services/ganglia-local/lib64/ganglia/python_modules/diskstats.py 
Download disk_gmetric.sh from http://ben.hartshorne.net/ganglia/ 
Save it as /opt/services/ganglia-local/bin/disk_gmetric.sh Then write a /etc/init.d/nrao-disk_gmetric which runs /opt/services/ganglia-local/bin/disk_gmetric.sh every 30 seconds
ln -s /opt/services/ganglia-local/init.d/nrao-disk_gmetric /etc/init.dchkconfig --level 345 nrao-disk_gmetric on/etc/init.d/nrao-disk_gmetric start


Installing Ganglia Server

Install the tarball made in the previous section
cd /opt/servicestar xfvz /home/src/ganglia/ganglia_nrao_`uname -i`-3.2.0.tgz

edit gmetad.conf and set the following
rrd_rootdir "/opt/services/ganglia-local/var/rrds"
Then make that directory
mkdir -p /opt/services/ganglia-local/var/rrdschown nobody /opt/services/ganglia-local/var/rrds

Install the apache web server (which is a taks left up to the reader) and configure a virutal host for ganglia. Then
mkdir /opt/services/ganglia/wwwcp -R ganglia-3.2.0/web/* to /opt/services/ganglia/www
edit /opt/services/ganglia/www/conf.php and modify the following
$gmetad_root = "/opt/services/ganglia-local/var";define("RRDTOOL", "/opt/services/ganglia/bin/rrdtool");$time_ranges = array(   'halfhour'=>1800,   'hour'=>3600,   '2hour'=>7200,   '4hour'=>14400,   '8hour'=>28800,   'day'=>86400,   'week'=>604800,   'month'=>2419200,   'year'=>31449600);

Finally
mkdir -p /opt/services/ganglia-local/var/dwoo/chown apache /opt/services/ganglia-local/var/dwoo/


Nagios


Installing a Client

This is only necessary if you need to monitor something that can only be done locally to the client (like 3ware card or disk usage):

echo "nagios:x:1103:1103:nagios:/var/log/nagios:/bin/sh" >> /etc/passwdecho 'nagios:!!:15280::::::' >> /etc/shadowecho "nagios:x:1103:" >> /etc/groupecho "nagios ALL = NOPASSWD: /opt/services/nagios-local/plugins/check_3ware.sh" >> /etc/sudoerssed -i -e 's/^Defaults.*requiretty/#Defaults    requiretty/' /etc/sudoerscd /opt/services ; tar xfvz /home/src/nagios/client/nagios-1.4.15-x86_64-nrao.tgz

edit /opt/services/nagios-local/plugins/check_3ware.sh and set TWCLI to the full path of the tw_cli program e.g. /opt/services/3ware/CLI/tw_cli

ln -s /opt/services/nrpe-local/init.d/nrao-nrpe /etc/init.dchkconfig --add nrao-nrpe on/etc/init.d/nrao-nrpe start


Nagios Front End(website) Administration

http://nagios.aoc.nrao.edu/
Login: admin
password: the admin passwd


Acknowledge Nagios Alerts

  • Click on Tactical Overview on the Left Side menu. Any alerts/issue will show up as red boxes.
  • Click on the red box and you can see the detail of the alert. If problem is a service (ie http) then the service will be highlighted.
  • Click on the problem, you will have a list of options on the left
  • Click on the icon of the man shoveling Acknowledge Problem.
  • Fill in the dialog box. This will prevent any further messages being sent about this problem
  • Once the problem as been cleared the system or service will automatically go back to its normal state.


Administrating our Nagios Server

Nagios lives on the server hugin in /opt/services/nagios. All configurations specific to aoc/nrao are contained in /opt/services/nagios/etc/nrao. Before any kind of monitoring, including services, can be done on a system you first must define the host.


Adding a host definition

edit the file /opt/services/nagios/etc/nrao/nraohosts.cfg and add a new entry like this

define host{        use             linux-server        host_name       hugin        alias           nagios        address         10.64.1.32        }

If you want to specify a group for a server, add it to the appropriate hostgroup definition found near the bottom of the nraohost.cfg file.


Adding monitoring for new service via nagios

If the service, like http, is already being monitored on existing servers and you just need to monitor it on a new server then add the name to the existing definition in the nraoservices.cfg file
define service{        use                     generic-service ;For monitor http services        host_name               hugin, vivaldi, penn, gila, acorn, magnolia, occam, smrti, whatever        service_description     http        check_command           check_http        }

If this is a new service, you may first need to define the nagios command that you will be using. A list of prebuilt commands can be found in /opt/services/nagios/libexec

To define a new command, edit the file nraocommands.cfg and add something like this:
define command{        command_name    check_cups        command_line    $USER1$/check_http -H $HOSTADDRESS$ -p 631}

In this example we are using the check_http plugin command to check the status of cups(port 631). Once the command is defined, then you can add an entry for it in the nraoservices.cfg.


Restarting/Reloading nagios definitions

Once additions are made, nagios configs need to be reloaded.
/etc/init.d/nrao-nagios reload
Nagios will check for configuration errors and will not reload if problems exists.


Resources

Additional information about nagios can be found at http://www.nagios.com/products/nagioscore
0 0
原创粉丝点击