Cluster Overview

From T2B Wiki
Jump to navigation Jump to search

Overview

The cluster is composed 3 groups of machines :

  • The User Interfaces (UI)
This is the cluster front-end, to use the cluster, you need to log into those machines
Servers : [ m0 , m1 , m2 , m3 ] , [ m5 , m6 , m7 , m8 , m9 ]


  • The Computing Machines :
    • The Computing Element (CE): This server is the brain of the batch system : it manages all the submitted jobs, and send them to the worker nodes.
Servers : cream02
  • The Worker Nodes (WN): This is the power of the cluster : they run multiple jobs in parallel and send the results & status back to the CE.
Servers : nodeXX-YY


  • The Storage Machines
    • The Storage Element: it is the brain of the cluster storage. Grid accessible, it knows where all the files are, and manages all the storage nodes.
Server : maite
  • The Storage Nodes: This is the memory of the cluster : they contain big data files. In total, they provide ~2300 TB of grid-accessible storage.
Servers : beharXXX
  • The User Storage: it provides the home of the UIs. It is a highly efficient & redundant storage node of ~70 TB capacity.


How to Connect

To connect to the cluster, you need to have sent us your public ssh key. In a terminal, type the following:

ssh -X -o ServerAliveInterval=100 username@m0.iihe.ac.be
Tip: the -o ServerAliveInterval=100 option is used to keep your session alive for a long period of time ! You should not be disconnected during a whole day of work.

After a successful login, you'll see this message :


         @@@@@@@@     @@@@             @@@@@     @@@@@@@
            @@       @    @            @@   @    @@
            @@            @    @@@@    @@@@@     @@@@
            @@         @@              @@    @   @@@@
            @@       @                 @@    @   @@
            @@       @@@@@@            @@@@@@    @@@@@@@
                              @ IIHE   

  Welcome to the t2b cluster ! You are on the following UI: m2 
You can find more info on our wiki page: http://t2bwiki.iihe.ac.be To contact us: grid_admin@listserv.vub.ac.be
Please remember this machine will allow you only 600s (10 minutes) of cpu time per processes. ________________________________________________________________________ Your Quota on /user: 43% used (282G left) There are 2 users here | Load: 7.51 /8 CPUs (2%) | Mem: 80% used


Please observe all the information in this message:

  • The wiki link, where you should go first to find the information
  • The email used for the cluster support (please use this one rather than personal mail, this way everyone on the support team can answer and track the progress.)
  • The cpu time limit imposed per process, as we divided our UIs into 2 groups.
The light task UIs (max CPU time = 10 minutes) : they are used for crab/local job submission, writing code, building debugging ...
m0.iihe.ac.be, m1.iihe.ac.be, m2.iihe.ac.be, m3.iihe.ac.be
The CPU-intensive UIs (max CPU time = 5 hours) : they are available for CPU-intensive and long tasks, although you should prefer using local job submission ...
m5.iihe.ac.be, m6.iihe.ac.be, m7.iihe.ac.be, m8.iihe.ac.be and m9.iihe.ac.be
  • The quota you have left on /user
  • Information about how heavily this UI is used. If any of them is red (ie above optimal usage), please consider using another UI. Please be mindful of other users and don't start too many processes, epsecially if the UI is already under charge.