To Restart The T2 After Power Down
When All the systems have been powered off, a clear procedure needs to be followed to restart all the systems.
As an improper order of restarting the servers would be useless, many of the servers will not restart automatically after an unscheduled power down. Intervention is always needed in this case.
1
Put Bridge back online. The name of this PC is bridge3, but it is not in the DNS.
It's IP number is 193.190.247.129
There are a lot of interfaces on this PC:
ifconfig gives: bond0 br0 -> 193.190.247.129
br1
eth[1-5]
2
Put ccq back online. You need to start it manually from the front.
Log in and check the virtual hosts:
xm list -> Domain 0 ccq3 -b----
check whether you can ssh to ccq3
3
Start up fs. See if the virtual host mon.iihe.ac.be is running (via xm list)
4
restart gridce and gliterb who are virtual hosts on dom01
if gridce or gliterb are not present, execute the following:
xm create gridce.iihe.ac.be xm create gliterb.iihe.ac.be
IMPORTANT: on gridce do:
cp /etc/hosts.equiv2 /etc/hosts.equiv
log into these virtual machines via ssh and see if /grid_mnt/pooluser is mounted.
5
look if you see maite via the web (http://maite.iihe.ac.be:2288
id pnfs mounted here?
6
start up frontier from the front panel.
7
Also log into the pools. look if they are all up and seen by the network (do an ssh to the different pools. For the moment, these are available:
behar behar2 behar3 behar4 behar6 behar020 behar021 behar022 behar023 behar024
In case of doubt, they are labeled.
8
due to a wrong reboot order still present (wn and m0, m1) it is possible that /pnfs/iihe is not correctly mounted on these machines.
You can mount is easily via "mount /pnfs/iihe", but since there are a lot of wn's, this is impractical. Therefore, there are scripts that login to the different machines and execute a command on it.
cd distrib_scripts ./distrib_exec_list <list_file> "<command>"
9
It is also important to see whether m0 and m1 are reachable from the lab. The access is done via tuncc.iihe.ac.be
On this machien, the routing should be fine. the way back is done via ccq. here look at /etc/rc.local and you might need to execute
/root/route.sh
Remark
Sometime you will have to rerun '/root/tun.sh cc' on ccq to make sure the tunnel is correctly set.