Cluster Overview: Difference between revisions

From T2B Wiki
Jump to navigation Jump to search
 
(21 intermediate revisions by the same user not shown)
Line 5: Line 5:
* The '''User Interfaces (UI)'''
* The '''User Interfaces (UI)'''
::This is the cluster front-end, to use the cluster, you need to log into those machines
::This is the cluster front-end, to use the cluster, you need to log into those machines
::::Servers : mshort [ m0 , m1 , m2 , m3 ] , mlong [ m5 , m6 , m7 , m8 , m9 ]
::::Servers : mshort [ m2 , m3 ] , mlong [ m0, m1 ]
:: The '''File Server''' provides the user home on the UIs. It is a highly efficient & redundant storage node of ~120 TB capacity with regular backups.
:: The '''File Server''' provides the user home on the UIs. It is a highly efficient & redundant storage node of ~120 TB capacity with regular backups.


<br>
<br>
* The '''Computing Machines :'''
* The '''Computing Machines'''
** The '''Computing Element (CE):''' This server is the brain of the batch system : it manages all the submitted jobs, and send them to the worker nodes.
** The '''Computing Element (CE):''' This is the gateway between the World and the T2B cluster: it receives all Grid jobs and submit them to the local batch system.
::::Servers : cream02
::::Servers : testumd-htcondorce (temporary)
 
:* The '''HTCondor Schedulers:''' This is the brain of the batch system: they manage all the submitted jobs, and send them to the worker nodes.
::::Servers : scheddXX


:* The '''Worker Nodes (WN): ''' This is the power of the cluster : they run multiple jobs in parallel and send the results & status back to the CE.
:* The '''Worker Nodes (WN): ''' This is the power of the cluster : they run multiple jobs in parallel and send the results & status back to the CE.
Line 19: Line 22:
** The '''Storage Element''': it is the brain of the cluster storage. Grid accessible, it knows where all the files are, and manages all the storage nodes.
** The '''Storage Element''': it is the brain of the cluster storage. Grid accessible, it knows where all the files are, and manages all the storage nodes.
::::Server : maite
::::Server : maite
:* The '''Storage Nodes''': This is the memory of the cluster : they contain big data files. In total, they provide ~5100 TB of grid-accessible storage.
:* The '''Storage Nodes''': This is the memory of the cluster : they contain big data files. In total, they provide ~8400 TB of grid-accessible storage.
::::Servers : beharXXX
::::Servers : beharXXX


Line 27: Line 30:


To connect to the cluster, you need to have sent us your public ssh key.
To connect to the cluster, you need to have sent us your public ssh key.
In a terminal, type the following:
In a terminal, type the following (adapt <MYLOGIN> accordingly WITHOUT the brackets <>):
  ssh -X -o ServerAliveInterval=100 username@mshort.iihe.ac.be
  ssh -X -o ServerAliveInterval=100 <MYLOGIN>@mshort.iihe.ac.be
:''Tip: the ''-o ServerAliveInterval=100'' option is used to keep your session alive for a long period of time ! You should not be disconnected during a whole day of work.''
:''Tip: the ''-o ServerAliveInterval=100'' option is used to keep your session alive for a long period of time ! You should not be disconnected during a whole day of work.''
:''Tip: use aliases to connect easily! eg add to your ''~/.bashrc'' file the following: ''alias mshort='ssh -X -o ServerAliveInterval=100 <MYLOGIN>@mshort.iihe.ac.be'
If connecting does not work, please follow the help [[Faq_t2b#Debugging_SSH_connection_to_mX_machines:|here]]. After a successful login, you'll see this message :
<span style='color:green'> (: Welcome to the T2B Cluster :) </span>
<span style='color:green'> ________________________________</span><br>
<span style='color:green'> The cluster is working properly</span><br>
___________________________________________________________________________<br>
  Mail: <span style='color:blue'>  grid_admin@listserv.vub.be</span>  |  Chat: <span style='color:blue'>  https://chat.iihe.ac.be</span>
  Wiki: <span style='color:blue'>  https://t2bwiki.iihe.ac.be</span> |  Status: <span style='color:blue'>https://status.iihe.ac.be</span>
___________________________________________________________________________<br>
  <span style='color:cyan'>[/user]</span>  =>  224 / 500 GB  <span style='color:green'>[44%]</span>  --|--  <span style='color:cyan'>[/pnfs]</span>  =>  <span style='color:green'>101 GB</span>  [01/12/2023]
___________________________________________________________________________<br>
  <span style='color:blue'>Welcome on [m7]</span> ! You have <span style='color:purple'>3600s (1 hours)</span> of cpu time per processes.
  There are <span style='color:green'>2 users</span> here  |  Load: <span style='color:red'>7.56 /4 CPUs (189%)</span>  |  Mem: <span style='color:green'>16% used</span><br>


After a successful login, you'll see this message :
<span style='color:green'>
          @@@@@@@@    @@@@            @@@@@    @@@@@@@
            @@      @    @            @@  @    @@
            @@            @    @@@@    @@@@@    @@@@
            @@        @@              @@    @  @@@@
            @@      @                @@    @  @@
            @@      @@@@@@            @@@@@@    @@@@@@@
                              @ IIHE 
</span>
  Welcome to the t2b cluster ! You are on the following UI: m2 <br>
  You can find more info on our wiki page: <span style='color:blue'>http://t2bwiki.iihe.ac.be</span>
            To contact us: <span style='color:blue'>grid_admin@listserv.vub.ac.be</span><br>
  Please remember this machine will allow you only <span style='color:red'>600s (10 minutes)</span>
      of cpu time per processes.
  ________________________________________________________________________
                  Your Quota on /user: <span style='color:green'>43% used</span> (282G left)
There are <span style='color:green'>2 users</span> here  |  Load: <span style='color:red'>7.51 /8 CPUs (2%)</span>  |  Mem: <span style='color:orange'>80% used</span>




Please observe all the information in this message:
Please observe all the information in this message:
* The header, telling you the health of the cluster. When there is an issue, the header of the welcome message will transform to:
<span style='color:red'> :( Welcome to the T2B Cluster ): </span>
<span style='color:red'> ________________________________</span><br>
<span style='color:red'> THERE ARE ISSUES ON THE CLUSTER</span>
<span style='color:red'> More details at </span><span style='color:magenta'>status.iihe.ac.be</span>
<span style='color:red'> (Register to receive updates)</span>
<br>
* The email used for the cluster support (please use this one rather than personal mail, this way everyone on the support team can answer and track the progress.)
* The wiki link, where you should go first to find the information
* The wiki link, where you should go first to find the information
* The email used for the cluster support (please use this one rather than personal mail, this way everyone on the support team can answer and track the progress.)
* The chat link, where you can easily contact us for fast exchanges. IIHE users can use their intranet account, others can just create an account.
* The cpu time limit imposed per process, as we divided our UIs into 2 groups.
* The status link, where you can see if the cluster has any problems reported. Please make sure you are registered to receive updates.
:: '''The light task''' UIs <span style='color:red'>(max '''CPU''' time = 10 minutes)</span> : they are used for crab/local job submission, writing code, building debugging ...
<br>
::<pre>mshort.iihe.ac.be :  m0.iihe.ac.be, m1.iihe.ac.be, m2.iihe.ac.be, m3.iihe.ac.be </pre>
* The space used on the mass storage /pnfs, where storing a few TB is no problem. No hard limits are applied, but please contact us if you plan to go over 20 TB!
:: '''The CPU-intensive''' UIs <span style='color:red'>(max '''CPU''' time = 5 hours)</span> : they are available for CPU-intensive and long tasks, although you should prefer using local job submission ...
* The quota used on /user (and /group). Here a hard limit is applied, so if you are at 100%, you will have many problems. Clean your space, and if you really need more contact us.<br><br>
::<pre>mlong.iihe.ac.be : m5.iihe.ac.be, m6.iihe.ac.be, m7.iihe.ac.be, m8.iihe.ac.be and m9.iihe.ac.be</pre>
* The cpu time limit imposed per process, as we divided our UIs into 2 groups. Please note '''processes will be killed''' if they go over their CPU-time limit!
* The quota you have left on /user
:: '''The light task''' UIs <span style='color:red'>(max '''CPU''' time = 20 minutes)</span> : they are used for crab/local job submission, writing code, debugging ...
* Information about how heavily this UI is used. If any of them is red (ie above optimal usage), please consider using another UI. Please be mindful of other users and don't start too many processes, epsecially if the UI is already under charge.
::<pre>mshort.iihe.ac.be :  m2.iihe.ac.be, m3.iihe.ac.be </pre>
 
:: '''The CPU-intensive''' UIs <span style='color:red'>(max '''CPU''' time = 5 hour)</span> : they are available for CPU-intensive and testing tasks/workflows, although you should prefer using local job submission ...
::<pre>mlong.iihe.ac.be : m0.iihe.ac.be, m1.iihe.ac.be</pre>
* Information about how heavily this UI is used. If any of them is red (ie above optimal usage), please consider using another UI. Please be mindful of other users and don't start too many processes, especially if the UI is already under charge.
<br>
* Sometimes announcements are printed at the end. Please make sure you read those.
<br>
<br>


Line 67: Line 81:


There are 2 main directories to store your work and data:
There are 2 main directories to store your work and data:
* '''/user [/$USER]''' : this is your home directory. You have an enforced quota there, as it is an expensive storage with redundancy and daily backups.
* '''/user [/$USER]''' : this is your home directory. You have an enforced quota there, as it is an expensive storage with redundancy and daily backups (see below).
* '''/pnfs [/iihe/cms/store/user/$USER]''' : this is where you can store a large amount of data, and is also [[GridStorageAccess|grid-accessible]]. If you need more than 2 TB, please contact us. THere is no backups there, so be careful of what you do !
* '''/pnfs [/iihe/MYEXP/store/user/$USER]''' : this is where you can store a large amount of data, and is also [[GridStorageAccess|grid-accessible]]. If you need more than a few TB, please contact us. There is no backups there, so be careful of what you do !
<br>
<br>
There are other directories than you might want to take notice of:
There are other directories than you might want to take notice of:
* '''/group''' : same as /user , but if you need to share/produce in a group.
* '''/group''' : same as /user , but if you need to share/produce in a group.
* '''/scratch''' : a temporary scratch space for your job. Use $TMPDIR on the WNs, it is cleanned after each job :)
* '''/scratch''' : a temporary scratch space for your job. Use $TMPDIR on the WNs, it is cleanned after each job :)
* '''/cvmfs''' : Centralised CVMFS software repository from CERN. It should contain most of the software you will need.
* '''/cvmfs''' : Centralised CVMFS software repository. It should contain most of the software you will need for your experiment. Find [[OtherSoftware|here]] how to get a coherent environment for most tools you will need.
* '''/swmgrs''' : local area for shared software not in /cvmfs . You can use a [[OtherSoftware|nice tool]] to find the software and versions available.
* '''/software''' : local area for shared software not in /cvmfs . You can use a [[OtherSoftware|nice tool]] to find the software and versions available.


<br>
<br>
== Batch System ==
== Batch System ==


The cluster is based on HTCondor (also used at CERN or Wisconsin for instance).
Please follow [[HTCondor|this page]] for details on how to use it.


=== Queues ===
The cluster is decomposed in queues


{| width="1064" cellspacing="1" cellpadding="5" border="1" align="center"
{| width="1064" cellspacing="1" cellpadding="5" border="1" align="center"
|-
! scope="col" | <br>
! scope="col" | localgrid
! scope="col" | highmem
! scope="col" | highbw
! scope="col" | express
! scope="col" | gpu2
! scope="col" | gpu6
|-
|-
! scope="row" | Description
! scope="row" | Description
| nowrap="nowrap" align="center" | default queue, all available nodes except express<br>
| nowrap="nowrap" align="center" | HTCondor batch ressources<br>
| nowrap="nowrap" align="center" | Subgroup of localgrid with WNs having 4GB Mem / Slot<br>
| nowrap="nowrap" align="center" | Subgroup of localgrid, subgroup of highmem, with WNs having 10Gb/s bandwidth access to storage.<br>
| nowrap="nowrap" align="center" | Limited walltime<br>
| nowrap="nowrap" align="center" | Has 2 NVidia Tesla M2050 GPUs / node<br>
| nowrap="nowrap" align="center" | Has 6 NVidia Tesla M2075 GPUs / node<br>
|-
|-
! scope="row" | # CPU's (Jobs)
! scope="row" | # CPU's (Jobs)
| nowrap="nowrap" align="center" | 8984<br>
| nowrap="nowrap" align="center" | 10700<br>
| nowrap="nowrap" align="center" | 7008<br>
| nowrap="nowrap" align="center" | 4896<br>
| nowrap="nowrap" align="center" | 16<br>
| nowrap="nowrap" align="center" | (GPUs) 2<br>
| nowrap="nowrap" align="center" | (GPUS) 12<br>
|-
|-
! scope="row" | Walltime limit
! scope="row" | Walltime limit
| nowrap="nowrap" align="center" colspan="3" | 168 hours  
| nowrap="nowrap" align="center" | 168 hours = 1 week
| nowrap="nowrap" align="center" |30 minutes <br>
| nowrap="nowrap" align="center" colspan="2" | 168 hours
|-
|-
! scope="row" | Memory limit
! scope="row" | Preferred Memory per job
| nowrap="nowrap" align="center" | 2 Gb<br>
| nowrap="nowrap" align="center" | 4 Gb<br>
| nowrap="nowrap" align="center" | 4 Gb<br>
| nowrap="nowrap" align="center" | 4 Gb<br>
| nowrap="nowrap" align="center" | 2 Gb<br>
| nowrap="nowrap" align="center" | 6 Gb<br>
| nowrap="nowrap" align="center" | 10 Gb<br>
|-
|-
! scope="row" | $TMPDIR/scratch max usable space
! scope="row" | $TMPDIR/scratch max usable space
| nowrap="nowrap" align="center" colspan="2"| 10 Gb<br>
| nowrap="nowrap" align="center" | 10-20 Gb<br>
| nowrap="nowrap" align="center" | 16 Gb<br>
| nowrap="nowrap" align="center" | 10 Gb<br>
| nowrap="nowrap" align="center" | 80 Gb<br>
| nowrap="nowrap" align="center" | 130 Gb<br>
|-
! scope="row" | Max # jobs running / User
| nowrap="nowrap" align="center" colspan="3"| 600 jobs<br>
| nowrap="nowrap" align="center" | 8 jobs<br>
| nowrap="nowrap" align="center" | 2 jobs<br>
| nowrap="nowrap" align="center" | 12 jobs<br>
|-
|-
! scope="row" | Max # jobs sent to the batch system / User
! scope="row" | Max # jobs sent to the batch system / User
| nowrap="nowrap" align="center" colspan="3"| 2500 jobs &nbsp;&nbsp;&nbsp;  (see [[bigSubmission|here]] if you want to send more) <br>
| nowrap="nowrap" align="center" | theoretically none (contact us if you plan on sending more than 10 000) <br>
| nowrap="nowrap" align="center" | 100 jobs<br>
| nowrap="nowrap" align="center" | 2500 jobs<br>
| nowrap="nowrap" align="center" | 2500 jobs<br>
|}
|}


<br>
<br>
=== Job submission ===
To submit a job, you just have to use the '''qsub''' command&nbsp;:
<pre>qsub myjob.sh
</pre>
''OPTIONS''
*-q queueName&nbsp;: choose the queue you want [mandatory]
*-N jobName&nbsp;: name of the job
*-I&nbsp;: (capital i) pass in interactive mode
*-m mailaddress&nbsp;: set mail address (use in conjonction with -m) : MUST be @ulb.ac.be or @vub.ac.be
*-m [a|b|e]&nbsp;: send mail on job status change (a = aborted , b = begin, e = end)
*-l&nbsp;: resources options
:: For instance, if you want to use 2 cores: -lnodes=1:ppn=2
[[File:Exclamation-mark.jpg|left|25px|line=1|]] If you want to send more than 2500 jobs to the cluster, write all qsub commands in a text file, and use the script '''big-submission''' (more info [[bigSubmission|here]]).
<br>
[[File:Exclamation-mark.jpg|left|25px|line=1|]] If you use MadGraph, read  [[Faq_t2b#MadGraph_taking_all_the_cores_of_a_workernode|this]] section first or you risk crashing the cluster.
<br>
[[File:Exclamation-mark.jpg|left|25px|line=1|]] If you want to use the GPUs, please read [[GPUs|here]]. <br><br>
=== Job management ===
To see all jobs (running / queued), you can use the '''qstat''' command, or go to the [https://jobview.iihe.ac.be/ JobView] page to have a summary of what's running.
<pre>qstat
</pre>
''OPTIONS''
* -u username&nbsp;: list only jobs submitted by username
* -n&nbsp;: show nodes where jobs are running
* -q&nbsp;: show the job repartition on queues
<br>
=== Job Statistics ===
All the log files from the batch system are synced every 30 minutes in:
/group/log/torque/
A simple script to analyze the logs and provide some statistics for the user is provided:
torque-user-info.py
Just execute it as is (it is in your $PATH, so executable from everywhere). It will print information like the following:
<pre>
ID: 6077555  ExCode:  0 Mem:    0M cpuT:      0s wallT:      3s eff:  0.0%  STDIN
ID: 6077602  ExCode:  0 Mem:  50M cpuT:      0s wallT:      2s eff:  0.0%  STDIN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
  user[G] # Jobs <MEM> +- RMS        #HiMem  MAX Mem      <CPU time>    <walltime>    <Eff>  % WT/WT_TOT    # Jobs with Error code (% of user job)
----------------------------------------------------------------------------------------------------------------------------------------------------------------
    rougny[l]     12     13 +- 22    MB      0      52 MB  |  00:00:00:00  00:00:00:24  ( 0.0%) (-1.00% of tot) | # EC:
</pre>
'''If you want to test the batch system, you can follow the workbook [[LocalSubmission|here]]
=== Job Deletion ===
Use the following command:
qdel <JOBID>
To delete all your jobs, be patient while using the following line:
for j in $(qselect -u $USER);do timeout 3 qdel -a $j;done


== Backup ==
== Backup ==
There are several areas that we regularly back up: '''/user''' , '''/group''' , '''/data''' , '''/ice3'''.<br>
There are several areas that we regularly back up: '''/user''' , '''/group''' , '''/ice3'''.<br>
You can find more information on the backup frequency and how to access them [[Backup|here]].
You can find more information on the backup frequency and how to access them [[Backup|here]].


== Usefull links ==
== Useful links ==
[http://gangliat2b.iihe.ac.be/ganglia/  Ganglia Monitoring] : status of all our servers.<br>
[http://ganglia.iihe.ac.be/ganglia/  Ganglia Monitoring] : stats on all our servers.<br>
[http://jobview.iihe.ac.be JobView Monitoring ] : summary of the cluster usage.
[http://status.iihe.ac.be Cluster Status] : current status of all T2B services. Check here before sending us an email. Please also consider registering to receive T2B issues and be informed when things are resolved.

Latest revision as of 11:07, 16 October 2024

Overview

The cluster is composed 3 groups of machines :

  • The User Interfaces (UI)
This is the cluster front-end, to use the cluster, you need to log into those machines
Servers : mshort [ m2 , m3 ] , mlong [ m0, m1 ]
The File Server provides the user home on the UIs. It is a highly efficient & redundant storage node of ~120 TB capacity with regular backups.


  • The Computing Machines
    • The Computing Element (CE): This is the gateway between the World and the T2B cluster: it receives all Grid jobs and submit them to the local batch system.
Servers : testumd-htcondorce (temporary)
  • The HTCondor Schedulers: This is the brain of the batch system: they manage all the submitted jobs, and send them to the worker nodes.
Servers : scheddXX
  • The Worker Nodes (WN): This is the power of the cluster : they run multiple jobs in parallel and send the results & status back to the CE.
Servers : nodeXX-YY


  • The Mass Storage
    • The Storage Element: it is the brain of the cluster storage. Grid accessible, it knows where all the files are, and manages all the storage nodes.
Server : maite
  • The Storage Nodes: This is the memory of the cluster : they contain big data files. In total, they provide ~8400 TB of grid-accessible storage.
Servers : beharXXX


How to Connect

To connect to the cluster, you need to have sent us your public ssh key. In a terminal, type the following (adapt <MYLOGIN> accordingly WITHOUT the brackets <>):

ssh -X -o ServerAliveInterval=100 <MYLOGIN>@mshort.iihe.ac.be
Tip: the -o ServerAliveInterval=100 option is used to keep your session alive for a long period of time ! You should not be disconnected during a whole day of work.
Tip: use aliases to connect easily! eg add to your ~/.bashrc file the following: alias mshort='ssh -X -o ServerAliveInterval=100 <MYLOGIN>@mshort.iihe.ac.be'

If connecting does not work, please follow the help here. After a successful login, you'll see this message :

 			(: Welcome to the T2B Cluster :) 
			________________________________
The cluster is working properly
___________________________________________________________________________
Mail: grid_admin@listserv.vub.be | Chat: https://chat.iihe.ac.be Wiki: https://t2bwiki.iihe.ac.be | Status: https://status.iihe.ac.be ___________________________________________________________________________
[/user] => 224 / 500 GB [44%] --|-- [/pnfs] => 101 GB [01/12/2023] ___________________________________________________________________________
Welcome on [m7] ! You have 3600s (1 hours) of cpu time per processes. There are 2 users here | Load: 7.56 /4 CPUs (189%) | Mem: 16% used


Please observe all the information in this message:

  • The header, telling you the health of the cluster. When there is an issue, the header of the welcome message will transform to:
 			:( Welcome to the T2B Cluster ): 
			________________________________
THERE ARE ISSUES ON THE CLUSTER More details at status.iihe.ac.be (Register to receive updates)


  • The email used for the cluster support (please use this one rather than personal mail, this way everyone on the support team can answer and track the progress.)
  • The wiki link, where you should go first to find the information
  • The chat link, where you can easily contact us for fast exchanges. IIHE users can use their intranet account, others can just create an account.
  • The status link, where you can see if the cluster has any problems reported. Please make sure you are registered to receive updates.


  • The space used on the mass storage /pnfs, where storing a few TB is no problem. No hard limits are applied, but please contact us if you plan to go over 20 TB!
  • The quota used on /user (and /group). Here a hard limit is applied, so if you are at 100%, you will have many problems. Clean your space, and if you really need more contact us.

  • The cpu time limit imposed per process, as we divided our UIs into 2 groups. Please note processes will be killed if they go over their CPU-time limit!
The light task UIs (max CPU time = 20 minutes) : they are used for crab/local job submission, writing code, debugging ...
mshort.iihe.ac.be :  m2.iihe.ac.be, m3.iihe.ac.be 
The CPU-intensive UIs (max CPU time = 5 hour) : they are available for CPU-intensive and testing tasks/workflows, although you should prefer using local job submission ...
mlong.iihe.ac.be : m0.iihe.ac.be, m1.iihe.ac.be
  • Information about how heavily this UI is used. If any of them is red (ie above optimal usage), please consider using another UI. Please be mindful of other users and don't start too many processes, especially if the UI is already under charge.


  • Sometimes announcements are printed at the end. Please make sure you read those.


Data Storage & Directory Structure

There are 2 main directories to store your work and data:

  • /user [/$USER] : this is your home directory. You have an enforced quota there, as it is an expensive storage with redundancy and daily backups (see below).
  • /pnfs [/iihe/MYEXP/store/user/$USER] : this is where you can store a large amount of data, and is also grid-accessible. If you need more than a few TB, please contact us. There is no backups there, so be careful of what you do !


There are other directories than you might want to take notice of:

  • /group : same as /user , but if you need to share/produce in a group.
  • /scratch : a temporary scratch space for your job. Use $TMPDIR on the WNs, it is cleanned after each job :)
  • /cvmfs : Centralised CVMFS software repository. It should contain most of the software you will need for your experiment. Find here how to get a coherent environment for most tools you will need.
  • /software : local area for shared software not in /cvmfs . You can use a nice tool to find the software and versions available.


Batch System

The cluster is based on HTCondor (also used at CERN or Wisconsin for instance). Please follow this page for details on how to use it.


Description HTCondor batch ressources
# CPU's (Jobs) 10700
Walltime limit 168 hours = 1 week
Preferred Memory per job 4 Gb
$TMPDIR/scratch max usable space 10-20 Gb
Max # jobs sent to the batch system / User theoretically none (contact us if you plan on sending more than 10 000)


Backup

There are several areas that we regularly back up: /user , /group , /ice3.
You can find more information on the backup frequency and how to access them here.

Useful links

Ganglia Monitoring : stats on all our servers.
Cluster Status : current status of all T2B services. Check here before sending us an email. Please also consider registering to receive T2B issues and be informed when things are resolved.