PutClusterOn

From T2B Wiki
Revision as of 12:28, 26 August 2015 by Maintenance script (talk | contribs) (Created page with " == How to put the cluster back on after a shutdown== PageOutline === Start all storage === Machines to put on:<br> All behars<br> careful: for all HP machines, ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

How to put the cluster back on after a shutdown

PageOutline

Start all storage

Machines to put on:
All behars

  careful: for all HP machines, first put on the JBOD, wait for 2 minutes and only then put on the servers.
NOTE: at the last startup, after having put on the JBODs, I found the servers on also, but the raids were mounted, so no problem.

freenas (aka behar062)
nexenta
jefke.wn
fs.wn

Start some servers

First, wait for all storage to be back
Then start putting some servers back online

  • the dom's
  • ccq
  • the M machines

Doms and VMs

As some servers were put off that are virtual, you need to log in to the corresponding dom to restart it in virt-manager.
From one dom machine, you should normally be able to connect to all the other dom's through virt-manager.
Concerning the order in which the VMs are to be restarted, you should first start qproxy and quattorrepository. The CEs (cream02) should be the last machine to restart.
Notice that there could also be other machines not being switched on. Have a look around.

On the machines that were put on automatically, the time will be wrong. This can easily be solved by restarting the ntpd server on all of them:

./distrib_exec_list list_virtual_machines_ntpd "service ntpd restart"

Workernodes

You can now slowly start putting on all the WN's.
Do not switch them on at once. Put a few at the time.

Start dCache

Got to maite:

/etc/init.d/pnfs start
/opt/d-cache/bin/dcache start

If there are some behars that need to be put read-only in dcache, do so now. on ccq:

./distrib_exec_list list_behars_all "/opt/d-cache/bin/dcache start"

Start phedex

go to frontier

su - phedex
./masterProd start
./masterDebug start

Reopen the queues

go to cream02

./enable_queues

pnfs on the Wn's

Sometimes, pnfs is not mounted correctly on all the WN's. Best to issue a mount -a on all the WN's:
on ccq:

./distrib_exec_list node_list_all_wns "mount -a"


Template:TracNotice