SetupMonitoringControlerSunfireV20z

From T2B Wiki
Jump to navigation Jump to search

Procedure to enable LSI controler monitoring on Sunfire v20z

We encountered the case of a Sunfire v20z server not managed by Quattor on which we wanted to enable monitoring of the disk controlers.

  • Get the brand and model of the controler :
cat /proc/scsi/scsi
which gave to us : LSILOGIC Model: 1030 IM.
  • Get the version of the operating system :
cat /proc/version
which in our case indicated RHEL4.
  • Go to the constructor's website :
http://www.lsi.com/cm/DownloadSearch.do?locale=EN
and download the drivers for your model and operating system.
  • Uncompress the downloaded file. Inside, you'll find a directory ../message/fusion. Copy this directory into /usr/src/linux/drivers.
At this stage, you should have a directory /usr/src/linux/drivers/message/fusion containing the following files :
	lsi/mpi_type.h
	lsi/mpi.h
	lsi/mpi_ioc.h
	lsi/mpi_cnfg.h
	lsi/mpi_raid.h
	mptctl.h
These files will be used when you will compile mpt-status.
  • Download the tarball containing the source files for the tool mpt-status at the following address :
http://www.drugphish.ch/~ratz/mpt-status/
  • Uncompress the tarball. Inside you'll find a file mpt-status that you must edit in a way to comment the following line :
#include <linux/compiler.h>
and then :
make
Once the make is successfully finished, you should find a binary file mpt-status.
  • Before testing the new mpt-status command, you must first load the kernel module mptctl :
modprobe mptctl
  • Simply test the mpt-status command by :
mpt-status -s
In our case, we got :
log_id 0 OPTIMAL 
phys_id 0 ONLINE 
phys_id 1 ONLINE
  • As everything works, you can install the files :
make install
cp man/mpt-status.8 /usr/share/man/man8
  • To make sure that the kernel module mptctl will be loaded at each startup of the server :
echo "alias char-major-10-220 mptctl" >> /etc/modprobe.conf
  • To make the check on the disk controler fully automatic, you can create the following script in /etc/cron.hourly :
#!/bin/sh 

# Check of controler state 


ADMIN="grid_admin@listserv.vub.ac.be" 

# The return code from mpt-status is a bit mask, and can be interepreted 
# according to the following table (current as of 1.2.0): 
#   Bit   Value   Meaning 
#   ----------------------------------------------------------------- 
#     0       1   Abnormal condition / unknown error 
#     1       2   A logical volume has failed 
#     2       4   A logical volume is degraded 
#     3       8   A logical volume is resyncing 
#     4      16   At least one physical disk failed 
#     5      32   At least one physical disk is in warning condition 

/usr/sbin/mpt-status -s >/dev/null 

if [ $? -ne 0 ] 
then 
    # Email out to let use know the disk failed 
    echo "" | mail -s "Disk drive failure on $HOSTNAME" ${ADMIN} -- -r gridstorage@ulb.ac.be 

    # Write a message to syslog so big brother can notify operations 
    logger -p daemon.info "STORAGE ERROR: A failure was detected with the LSI Logic RAID controller or one of the disk drives" 
    logger -p daemon.info "STORAGE ERROR: Run /usr/sbin/mpt-status to view the status of the storage subsystem" 
fi 

exit 0 
And finally, to make the script runnable :
chmod 755 /etc/cron.hourly/checklsi.sh


Template:TracNotice