From T2B Wiki
Jump to navigation Jump to search

Upgrade of Workernodes in CB6 to SL5.5

  • Currently all nodes in CB6 are running SL5.3, the goal is to upgrade them to SL5.5
  • The testing node is node16-5; first step is to out it offline so it can drain.
pbsnodes -o node16-5.wn.iihe.ac.be
  • In the template for the os version the entry for node16-5 is upgraded to sl550-x86_64
  • This file is then compiled in the CDB and commited
  • To make the changes do a runcheck on ccq3
ssh ccq3
cd /opt/CB6/svncheck/
  • And do the changes
aii-shellfe --remove node16-5.wn.iihe.ac.be #remove the current configuration for this machine
aii-shellfe --configure node16-5.wn.iihe.ac.be #create the new kickstart file for this machine
aii-shellfe --install node16-5.wn.iihe.ac.be#Update of DHCP and PXE for this machine to be installed at next boot
  • To check the status simply issue something like this
for line in $(cat wn_list); do aii-shellfe --status $line; done

REMARK: it is better to do the remove before the runcheck to not clash the xml file with the existing configuration

  • Reboot the machine
ssh node16-5.wn
reboot -n
  • Once the machine is back you will see in the spma log file the following error message
error code: cpio: open failed - Too many open files
  • To fix this you have to reboot the machine after moving the kickstart log file
mv ks-post-install.log ks-post-install.log.old
  • Once the quattor installation finished successfully you still have to reboot a last time

Re-installation campaign

  • The re-installation campaign is executed in the following steps
  • Adapt the file os_version_db.tpl in the quattor database, compile, commit and do a runcheck
  • First of all, all nodes need to reconfigured with the aii-shellfe command. In the following location you find a little script to do this:
[root@ccq3] /root/joris/nodeReinstallation.sh
  • The script takes as an argument a file listing all the workernodes to be removed. The file was generated with the a script found here:
[root@ccq] /root/stephane/generate_list_of_online_nodes_CB6.sh
  • Once the script is executed on ccq3 it is time to move to ccq to perform the reboot of all the nodes. To do this Stephane has written a nice script. More details about this script can be found here.
  • The script has been adapted by Joris to allow the re-installation of a workernode. Technically a few more reboot cycles have been added with the appropriate handles to check if the installation was done in a correct way. The script can be found here and is executed with the same options as the original script
cd /root/stephane/UserCode/T2B_IIHE/reboot_wns
./reboot_wns.pl --init wnfile_list_sl55campaign
./reboot_wns.pl --start
  • In this directory there are some small scripts to test various things, eg.
    • ./print_wn_job_number.pl wn_list: shows the number of jobs running on this node
    • ./clear_after_test.pl: removes all lists of workernodes

List of workernodes that are under update procedure

workernode status done
node15-1.wn.iihe.ac.be y + on
node15-2.wn.iihe.ac.be y + on
node15-3.wn.iihe.ac.be y + on
node15-4.wn.iihe.ac.be not in production
node15-5.wn.iihe.ac.be not in production
node15-6.wn.iihe.ac.be not in production
node15-7.wn.iihe.ac.be not in production
node15-8.wn.iihe.ac.be y + on
node16-1.wn.iihe.ac.be y + on
node16-2.wn.iihe.ac.be reserved y
node16-3.wn.iihe.ac.be y + on
node16-4.wn.iihe.ac.be y + on
node16-5.wn.iihe.ac.be y + on
node16-6.wn.iihe.ac.be removed
node16-7.wn.iihe.ac.be y + on
node16-8.wn.iihe.ac.be reserved
node16-9.wn.iihe.ac.be y + on
node16-10.wn.iihe.ac.be reserved y
node17-1.wn.iihe.ac.be reserved y
node17-2.wn.iihe.ac.be n/a
node17-3.wn.iihe.ac.be y + on
node17-4.wn.iihe.ac.be y + on
node17-5.wn.iihe.ac.be reserved y
node17-6.wn.iihe.ac.be y + on
node17-7.wn.iihe.ac.be y + on
node17-8.wn.iihe.ac.be y + on
node17-9.wn.iihe.ac.be y + on
node17-10.wn.iihe.ac.be y + on
node17-11.wn.iihe.ac.be y + on
node17-12.wn.iihe.ac.be y + on
node17-13.wn.iihe.ac.be y + on
node17-14.wn.iihe.ac.be down
node18-1.wn.iihe.ac.be y + on
node18-2.wn.iihe.ac.be y + on
node18-3.wn.iihe.ac.be y + on
node18-4.wn.iihe.ac.be y + on
node18-5.wn.iihe.ac.be y + on
node18-6.wn.iihe.ac.be y + on
node18-7.wn.iihe.ac.be y + on
node18-8.wn.iihe.ac.be y + on
node19-1.wn.iihe.ac.be y + on
node19-2.wn.iihe.ac.be y + on
node19-3.wn.iihe.ac.be y + on
node19-4.wn.iihe.ac.be y + on
node19-5.wn.iihe.ac.be y + on
node19-6.wn.iihe.ac.be y + on
node19-7.wn.iihe.ac.be y + on
node19-8.wn.iihe.ac.be y + on
node19-9.wn.iihe.ac.be y + on
node19-10.wn.iihe.ac.be y + on
node19-11.wn.iihe.ac.be y + on
node19-12.wn.iihe.ac.be y + on
node19-13.wn.iihe.ac.be y + on
node19-14.wn.iihe.ac.be y + on
node19-15.wn.iihe.ac.be y + on
node19-16.wn.iihe.ac.be y + on
node19-17.wn.iihe.ac.be y + on
node19-18.wn.iihe.ac.be y + on
node19-19.wn.iihe.ac.be y + on
node19-20.wn.iihe.ac.be y + on
node19-21.wn.iihe.ac.be y + on
node19-22.wn.iihe.ac.be y + on
node19-23.wn.iihe.ac.be y + on
node19-24.wn.iihe.ac.be y + on
node19-25.wn.iihe.ac.be y + on
node19-26.wn.iihe.ac.be y + on
node19-27.wn.iihe.ac.be y + on
node19-28.wn.iihe.ac.be y + on
node19-29.wn.iihe.ac.be y + on
node19-30.wn.iihe.ac.be y + on
node19-31.wn.iihe.ac.be y + on
node19-32.wn.iihe.ac.be y + on
node20-1.wn.iihe.ac.be y + on
node20-2.wn.iihe.ac.be y + on
node20-3.wn.iihe.ac.be y + on
node20-4.wn.iihe.ac.be y + on
node20-5.wn.iihe.ac.be y + on
node20-6.wn.iihe.ac.be y + on
node20-7.wn.iihe.ac.be y + on
node20-8.wn.iihe.ac.be y + on
node20-9.wn.iihe.ac.be y + on
node20-10.wn.iihe.ac.be y + on
node20-11.wn.iihe.ac.be y + on
node20-12.wn.iihe.ac.be y + on
node20-13.wn.iihe.ac.be y + on
node20-14.wn.iihe.ac.be y + on
node20-15.wn.iihe.ac.be y + on
node20-16.wn.iihe.ac.be y + on
node20-17.wn.iihe.ac.be y + on
node20-18.wn.iihe.ac.be y + on
node20-19.wn.iihe.ac.be y + on
node20-20.wn.iihe.ac.be y + on
node20-21.wn.iihe.ac.be y + on
node20-22.wn.iihe.ac.be y + on
node20-23.wn.iihe.ac.be need to be included in maui
node20-24.wn.iihe.ac.be need to be included in maui