Nagios
Nagios Monitoring
- On this page you'll find all information on the Nagios instance to monitor the T2_BE_IIHE
- The webpage of Nagios can be found here.
- The freely available Nagios is named Nagios Core and the full documentation is found here
Quick link section
- The web interface of our Nagios can be found on http://egon.iihe.ac.be/nagios.
- On this page the current status of the site can be monitored as well as the event logs
- It is possible to reschedule a Nagios check from this web interface: HOWTO restart a nagios test manually
username: nagiosadmin passwd: ****
- The dutch grid infrastructure provides us with test of various GRID aspects.
- https://sam-ngi.grid.sara.nl/nagios/
- More info about the tests that are run against our site:
- https://tomtools.cern.ch/confluence/display/SAMDOC/Grid+probes
Installation
Installation by Quattor
Nagios server
- Nagios is installed on egon.iihe.ac.be
- It is based on a Quattor base machine
- It is a virtual machine running on dom05.wn.iihe.ac.be
- A template nagios.tpl is available including the rpms needed for deploying a nagios server.
cfg/sites/iihe-production/config/nagios.tpl
- Some of these rpms are added to the begrid repository since they were missing. A procedure how to do this is described here:
https://mon.iihe.ac.be/trac/t2b/wiki/GridAdminSurvivalGuide
- This template has never been used to install a brand new server. So in case of troubles have a look at the installation by hand, which was used the first time egon was installed.
Nagios client
- The same template as the nagios server takes care of the rpms but excludes eg. apache, nagios core,...
- In this template the nrpe_commands.cfg file is created with filecopy. This file is needed to make sure the commands to execute remote scripts are distributed to all machines.
Installation by hand
- How to install a nagios server by hand on egon
- How to install a nagios server by hand on an Ubuntu desktop. This was tested on a laptop and was found to work
- Tutorial for installation and configuration: http://wiki.centos.org/HowTos/Nagios
- Tutorial for installation: http://docs.cslabs.clarkson.edu/wiki/Install_Nagios_on_CentOS_5
Configuration
- On egon the nagios configuration is done in /etc/nagios
- One config file is steering the whole nagios instance /etc/nagios/nagios.cfg
- It includes the object configuration files to define the hosts, services, contacts,...
- After changing the configuration files one needs to restart the nagios daemon. It is useful to test first if the configuration doesn't contain bugs.
nagios -v /etc/nagios/nagios.cfg /etc/init.d/nagios restart
- Introduction to Nagios: Tutorial with a detailed explanation of config files
- Tutorial for configuration: Tutorial for configuration
Hosts
- The hosts.cfg file is made based on databases.tpl using a little script
- The script can be found here
/etc/nagios/scripts/makedb.sh
- As soon as new machines are added to databases.tpl this file needs to be copied from ccq to egon
- When running the script a temporary file serverdb is made that contains a list of hostname and ip-address
- Based on the configuration file makedb.cfg a number of hosts are excluded from being put in hosts.cfg
- The script contains some hardcoded servers and has the hostgroups hardcoded as well
- When the script is finished, don't forget to copy hosts.cfg to the objects directory (and make a backup of the previous file so you can see the changes)
Services
- The services.cfg file contains a definition of all checks (services) executed by nagios.
- The definition of the services include a link to a plugin (script) that will actually perform the check. eg.
check_command check_tcp1500225 check_command check_nrpecheck_mem
- The first one is a script that will run on egon. The definition of check_tcp can be found in commands.cfg:
# 'check_tcp' command definition define command{ command_name check_tcp command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$ }
- Where $USER1$ is on egon
/usr/lib64/nagios/plugins
- The second one is a more complex case. The test will execute a script over NRPE
- Nagios is using NRPE (Nagios Remote Plugin Executor) to actively monitor services on client hosts.
- a list of plugins can be found here and here.
- More info on NRPE: http://nagios.sourceforge.net/docs/3_0/addons.html
- More info on NRPE: http://www.thegeekstuff.com/2008/06/how-to-monitor-remote-linux-host-using-nagios-30/
- More info on NRPE: http://markmail.org/message/uwzrxhugtusi7i5w#query:+page:1+mid:hzqla7ciu5xvxo4q+state:results
- The NRPE script will take as argument a script that is available on the client host.
- The definition of check_mem can be found in /etc/nagios/nrpe_commands.cfg
command[check_mem]=/usr/lib64/nagios/plugins/check_mem 85 95
- IMPORTANT: the /etc/nagios/nrpe_commands.cfg file is managed by Quattor since it has to be available on all hosts
- Since the Nagios NRPE functionality is basically executing scripts this can be run on the command line for testing purposes, eg.
/usr/local/nagios/libexec/check_nrpe -H m4.iihe.ac.be -c check_nfsstat_access
- Based on the output of the plugin (OK/Warning/Critical/Unknown a mail can be send, or other actions can be performed)
- Some of the nagios plugins are home-brew and are distributed on all the machines with an RPM using Quattor.
- Specific info on the nagios plugins can be found here
- HOWTO pack the RPM is explained here here
- Since the RPMs are distributed via Quattor and some script need sudo priviliges, Quattor will also take care of this.
- Nagios is able to perform an action in case a state changes, this is called event handlers, more info on the local implementation can be found here
Contacts
- contacts.cfg together with contactgroups.cfg defines the contact persons and when/whatfor they will be contacted.
- Only a few contact persons are created.
- Two groups exist:
- One to send mails to all admins
- One for testing newly added checks
Extensions
- Nagios and Ganglia: Tutorial explaining how to interface ganglia in Nagios
Installation of PNP4NAGIOS by hand on egon
- The full documentation is here http://docs.pnp4nagios.org/pnp-0.4/start
- I've downloaded version 0.6.11, which is the latest (17/03/2011)
- RRDtool is needed; get it from sourceforge repo
wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm rpm -Uhv rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm yum install rrdtool
- The configuration was done by executing
- There is a misconfiguration in /etc/httpd/conf.d/pnp4nagios.conf, make sure this line is set:
AuthUserFile /etc/nagios/htpasswd.users
- Once the webpage http://egon.iihe.ac.be/pnp4nagios/ is accessible I still needed to get
yum install php-gd
- In the the commands.cfg file the processing of the perfdata needs to be set correclty
/etc/nagios/libexec/process_perfdata.pl
- The first installation was done in /etc but this is not the appropriate place. Now the configuration is done with option
./configure --with-perfdata-dir=/var/nagios/perfdata
resulting in
*** Configuration summary for pnp4nagios-0.6.11 01-15-2011 *** General Options: ------------------------- ------------------- Nagios user/group: nagios nagios Install directory: /usr/local/pnp4nagios HTML Dir: /usr/local/pnp4nagios/share Config Dir: /usr/local/pnp4nagios/etc Location of rrdtool binary: /usr/bin/rrdtool Version 1.4.4 RRDs Perl Modules: FOUND (Version 1.4004) RRD Files stored in: /var/nagios/perfdata process_perfdata.pl Logfile: /usr/local/pnp4nagios/var/perfdata.log Perfdata files (NPCD) stored in: /usr/local/pnp4nagios/var/spool Web Interface Options: ------------------------- ------------------- HTML URL: http://localhost/pnp4nagios Apache Config File: /etc/httpd/conf.d/pnp4nagios.conf
Backup
- A daily backup of the nagios configuration is made
/etc/cron.daily/backup_nagios.sh
- The tarball is copied to jefke
/userbackup/backup_egon_nagios/