To Restart The T2 After Power Down

From T2B Wiki
Revision as of 12:29, 26 August 2015 by Maintenance script (talk | contribs) (Created page with " When All the systems have been powered off, a clear procedure needs to be followed to restart all the systems. <br> As an improper order of restarting the servers would be u...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

When All the systems have been powered off, a clear procedure needs to be followed to restart all the systems.
As an improper order of restarting the servers would be useless, many of the servers will not restart automatically after an unscheduled power down. Intervention is always needed in this case.

1

Put Bridge back online. The name of this PC is bridge3, but it is not in the DNS.
It's IP number is 193.190.247.129
There are a lot of interfaces on this PC:
ifconfig gives: bond0 br0 -> 193.190.247.129

               br1 
eth[1-5]

2

Put ccq back online. You need to start it manually from the front.
Log in and check the virtual hosts:

xm list -> Domain 0
           ccq3        -b----

check whether you can ssh to ccq3


3

Start up fs. See if the virtual host mon.iihe.ac.be is running (via xm list)

4

restart gridce and gliterb who are virtual hosts on dom01
if gridce or gliterb are not present, execute the following:

xm create gridce.iihe.ac.be
xm create gliterb.iihe.ac.be

IMPORTANT: on gridce do:
cp /etc/hosts.equiv2 /etc/hosts.equiv


log into these virtual machines via ssh and see if /grid_mnt/pooluser is mounted.

5

look if you see maite via the web (http://maite.iihe.ac.be:2288
id pnfs mounted here?

6

start up frontier from the front panel.

7

Also log into the pools. look if they are all up and seen by the network (do an ssh to the different pools. For the moment, these are available:

behar
behar2
behar3
behar4
behar6
behar020
behar021
behar022
behar023
behar024

In case of doubt, they are labeled.

8

due to a wrong reboot order still present (wn and m0, m1) it is possible that /pnfs/iihe is not correctly mounted on these machines.
You can mount is easily via "mount /pnfs/iihe", but since there are a lot of wn's, this is impractical. Therefore, there are scripts that login to the different machines and execute a command on it.

cd distrib_scripts
./distrib_exec_list <list_file> "<command>"

9

It is also important to see whether m0 and m1 are reachable from the lab. The access is done via tuncc.iihe.ac.be
On this machien, the routing should be fine. the way back is done via ccq. here look at /etc/rc.local and you might need to execute

/root/route.sh

Remark

Sometime you will have to rerun '/root/tun.sh cc' on ccq to make sure the tunnel is correctly set.


Template:TracNotice