NewMethodUpdateKernelWorkernodes

From T2B Wiki
Revision as of 12:28, 26 August 2015 by Maintenance script (talk | contribs) (Created page with " === Short description of the reboot script === Now, on the workernodes, there is a cron script, written in Perl, that checks if a reboot is needed and if if all the conditio...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Short description of the reboot script

Now, on the workernodes, there is a cron script, written in Perl, that checks if a reboot is needed and if if all the conditions are met in order to reboot safely. Here is the checklist done by the script :

  • node must be offline;
  • kernel release in "uname -r" is different from default kernel in /etc/grub.conf;
  • node must be drained.

If all these conditions are fullfilled, then a "reboot" command is issued.

The new recipe to update kernel on the workernodes

Normally, in CB9, the kernel update should be done automatically by SPMA-YUM through the usual method of update of repositories. And so, it should not be necessary to specify explicitly a kernel release in the templates. However, due to a bug in the component ncm-grub (versions 14.2 to 14.5), we are in fact obliged to do so, by setting the variable KERNEL_VERSION_NUM in site/global_variables. (Note that it is in fact a nasty trick : it only works because all the machines in CB9 are in SL63.)

In brief, here is the recipe :

  1. Update the errata repository for SL.
2. Check in the updated repo what is the new kernel release and report the value in KERNEL_VERSION_NUM.
3. Adapt the nlist REPO_YUM_SNAPSHOT_DATE in site/global_variables.
4. Commit and runcheck.
5. Put offline the nodes you want to update.
6. Wait...

Software environment of the reboot script

The reboot script is deployed through Quattor, thanks to the template :

config/reboot_wn

For the script to work, the cream server must publish a list of offline workernodes on a webserver. This is done through the template :

config/cream_clusterinfo

If you look at the code of this template, you will see that the list of offine workernodes is copied to the webserver like this :

scp offline_wns.txt quatcli:/var/www/html/clusterinfo/

So, make sure that the public ssh key of cream is in the .ssh/authorized_keys of quatcli, and do at least on ssh from cream server to the webserver to add the webserver the know_hosts file.

As you don't want the whole web to access your webserver, you may want to secure it a bit by adding something like this at the end of the httpd.conf :

<Location /clusterinfo/>
    Order deny,allow
    Deny from all
    Allow from wn.iihe.ac.be
    Allow from iihe.ac.be
</Location>


Template:TracNotice