Nagios

From T2B Wiki
Jump to navigation Jump to search

Nagios Monitoring

PageOutline

  • On this page you'll find all information on the Nagios instance to monitor the T2_BE_IIHE
  • The webpage of Nagios can be found here.
  • The freely available Nagios is named Nagios Core and the full documentation is found here

Quick link section

username: nagiosadmin
passwd: ****



Installation

Installation by Quattor

Nagios server

  • Nagios is installed on egon.iihe.ac.be
    • It is based on a Quattor base machine
    • It is a virtual machine running on dom05.wn.iihe.ac.be
  • A template nagios.tpl is available including the rpms needed for deploying a nagios server.
cfg/sites/iihe-production/config/nagios.tpl
  • Some of these rpms are added to the begrid repository since they were missing. A procedure how to do this is described here:
https://mon.iihe.ac.be/trac/t2b/wiki/GridAdminSurvivalGuide
  • This template has never been used to install a brand new server. So in case of troubles have a look at the installation by hand, which was used the first time egon was installed.

Nagios client

  • The same template as the nagios server takes care of the rpms but excludes eg. apache, nagios core,...
  • In this template the nrpe_commands.cfg file is created with filecopy. This file is needed to make sure the commands to execute remote scripts are distributed to all machines.

Installation by hand



Configuration

  • On egon the nagios configuration is done in /etc/nagios
  • One config file is steering the whole nagios instance /etc/nagios/nagios.cfg
    • It includes the object configuration files to define the hosts, services, contacts,...
  • After changing the configuration files one needs to restart the nagios daemon. It is useful to test first if the configuration doesn't contain bugs.
nagios -v /etc/nagios/nagios.cfg 
/etc/init.d/nagios restart

Hosts

  • The hosts.cfg file is made based on databases.tpl using a little script
  • The script can be found here
/etc/nagios/scripts/makedb.sh
    • As soon as new machines are added to databases.tpl this file needs to be copied from ccq to egon
    • When running the script a temporary file serverdb is made that contains a list of hostname and ip-address
    • Based on the configuration file makedb.cfg a number of hosts are excluded from being put in hosts.cfg
    • The script contains some hardcoded servers and has the hostgroups hardcoded as well
    • When the script is finished, don't forget to copy hosts.cfg to the objects directory (and make a backup of the previous file so you can see the changes)


Services

  • The services.cfg file contains a definition of all checks (services) executed by nagios.
    • The definition of the services include a link to a plugin (script) that will actually perform the check. eg.
check_command			check_tcp1500225
check_command			check_nrpecheck_mem
    • The first one is a script that will run on egon. The definition of check_tcp can be found in commands.cfg:
# 'check_tcp' command definition
define command{
	command_name	check_tcp
	command_line	$USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
	}
    • Where $USER1$ is on egon
/usr/lib64/nagios/plugins
command[check_mem]=/usr/lib64/nagios/plugins/check_mem 85 95
    • IMPORTANT: the /etc/nagios/nrpe_commands.cfg file is managed by Quattor since it has to be available on all hosts
    • Since the Nagios NRPE functionality is basically executing scripts this can be run on the command line for testing purposes, eg.
/usr/local/nagios/libexec/check_nrpe -H m4.iihe.ac.be -c check_nfsstat_access
    • Based on the output of the plugin (OK/Warning/Critical/Unknown a mail can be send, or other actions can be performed)
    • Some of the nagios plugins are home-brew and are distributed on all the machines with an RPM using Quattor.
    • Specific info on the nagios plugins can be found here
    • HOWTO pack the RPM is explained here here
    • Since the RPMs are distributed via Quattor and some script need sudo priviliges, Quattor will also take care of this.
    • Nagios is able to perform an action in case a state changes, this is called event handlers, more info on the local implementation can be found here

Contacts

  • contacts.cfg together with contactgroups.cfg defines the contact persons and when/whatfor they will be contacted.
  • Only a few contact persons are created.
  • Two groups exist:
    • One to send mails to all admins
    • One for testing newly added checks

Extensions

Installation of PNP4NAGIOS by hand on egon

wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm
rpm -Uhv rpmforge-release-0.5.2-2.el5.rf.x86_64.rpm
yum install rrdtool
  • The configuration was done by executing

  • There is a misconfiguration in /etc/httpd/conf.d/pnp4nagios.conf, make sure this line is set:
AuthUserFile /etc/nagios/htpasswd.users
yum install php-gd
  • In the the commands.cfg file the processing of the perfdata needs to be set correclty
/etc/nagios/libexec/process_perfdata.pl
  • The first installation was done in /etc but this is not the appropriate place. Now the configuration is done with option
./configure --with-perfdata-dir=/var/nagios/perfdata
  resulting in
*** Configuration summary for pnp4nagios-0.6.11 01-15-2011 ***

  General Options:
  -------------------------         -------------------
  Nagios user/group:                nagios nagios
  Install directory:                /usr/local/pnp4nagios
  HTML Dir:                         /usr/local/pnp4nagios/share
  Config Dir:                       /usr/local/pnp4nagios/etc
  Location of rrdtool binary:       /usr/bin/rrdtool Version 1.4.4
  RRDs Perl Modules:                FOUND (Version 1.4004)
  RRD Files stored in:              /var/nagios/perfdata
  process_perfdata.pl Logfile:      /usr/local/pnp4nagios/var/perfdata.log
  Perfdata files (NPCD) stored in:  /usr/local/pnp4nagios/var/spool

  Web Interface Options:  -------------------------         -------------------
  HTML URL:                         http://localhost/pnp4nagios
  Apache Config File:               /etc/httpd/conf.d/pnp4nagios.conf



Backup

  • A daily backup of the nagios configuration is made
/etc/cron.daily/backup_nagios.sh
  • The tarball is copied to jefke
/userbackup/backup_egon_nagios/



Template:TracNotice