How to put the cluster back on after a shutdown
Start all storage
Machines to put on:
All behars
careful: for all HP machines, first put on the JBOD, wait for 2 minutes and only then put on the servers.
NOTE: at the last startup, after having put on the JBODs, I found the servers on also, but the raids were mounted, so no problem.
freenas (aka behar062)
Start some servers
First, wait for all storage to be back
Then start putting some servers back online
- the dom's
- ccq
- the M machines
Doms and VMs
As some servers were put off that are virtual, you need to log in to the corresponding dom to restart it in virt-manager.
From one dom machine, you should normally be able to connect to all the other dom's through virt-manager.
Concerning the order in which the VMs are to be restarted, you should first start qproxy and quattorrepository. The CEs (cream02) should be the last machine to restart.
Notice that there could also be other machines not being switched on. Have a look around.
On the machines that were put on automatically, the time will be wrong. This can easily be solved by restarting the ntpd server on all of them:
./distrib_exec_list list_virtual_machines_ntpd "service ntpd restart"
You can now slowly start putting on all the WN's.
Do not switch them on at once. Put a few at the time.
Start dCache
Got to maite:
/etc/init.d/pnfs start /opt/d-cache/bin/dcache start
If there are some behars that need to be put read-only in dcache, do so now. on ccq:
./distrib_exec_list list_behars_all "/opt/d-cache/bin/dcache start"
Start phedex
go to frontier
su - phedex ./masterProd start ./masterDebug start
Reopen the queues
go to cream02
pnfs on the Wn's
Sometimes, pnfs is not mounted correctly on all the WN's. Best to issue a mount -a on all the WN's:
on ccq:
./distrib_exec_list node_list_all_wns "mount -a"