Shutdown

From T2B Wiki
Revision as of 12:29, 26 August 2015 by Maintenance script (talk | contribs) (Created page with " === Shutting down a cluster === ==== 1. Announce the downtime ==== Announcing the downtime as far as possible in advance is a crucial part of setting the downtime. Th...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Shutting down a cluster

1. Announce the downtime

Announcing the downtime as far as possible in advance is a crucial part of setting the downtime. The advance announcement allows VOs to e.g. copy their data from your SE to some other site.

2. Set "Scheduled downtime" in GOC DB

  • You should set downtime period in GOC DB.
  • Grid Operations Centre will take your downtime into account immediately.
  • Send an email to our "local" users:
To: "belgian-t2-users" <belgian-t2-users@cern.ch>
CC: grid_user@listserv.vub.ac.be
Subjet: Downtime at IIHE Tier2 site on 30/03/2009

Dear Users,

We will put the IIHE Tier2 site in downtime on 30/03/2009 from 14:00 to 16:00 due to a maintenance on the storage system.
The access to the m0 and m1 machines will be possible but the storage system will be partially offline.


Notice that we will put the job queues in draining mode already from today 27/07/2009 at 18:00. 
This means that no more job will be accepted from that moment. All jobs that are already in the queues will be executed.

Sorry for the inconvenience.

Best regards,
The IIHE Grid Team

3. Disable the queues

  • In standard installation (PBS server on CE) run the qmgr command on your CE and drain the queues by typing the following:
qmgr -c "set queue queuename enabled=false"
  • Then wait until all jobs drain from the queues and after that close them with:
qmgr -c "set queue queuename started=false"
  • After a while these values will be picked up by MDS and no jobs will be submitted to these queues. You may consider leaving the "dteam" and "ops" queues enabled. It is useful to have this queue enabled even during Scheduled Downtime to allow testing by SAM. This allow you to see if your site works fine immediately after the upgrade.
  • Queues and all other attributes of the server can be listed by executing following command
qmgr -c "print server"


4. Check the information in your MDS

MDS will pick up information about disabled queues and automatically start reporting "GlueCEStateStatus: Closed" for each disabled queue. You can see the queues' status with the following command:

ldapsearch -x -H ldap://gridce.iihe.ac.be:2170 -b mds-vo-name=BEgrid-ULB-VUB,o=grid | grep -B 10 GlueCEStateStatus



Now, your site is ready for the upgrade.


Restart the site when the upgrade when all the site has been powered down

To_Restart_The_T2_After_Power_Down



Template:TracNotice