PBS: Difference between revisions

From T2B Wiki
Jump to navigation Jump to search
No edit summary
 
Line 1: Line 1:


=== UI to use ===
=== UI to use ===
Now only '''m1, m8 & m9''' are still in SL6 and can submit jobs to the old PBS/QSUB qcluster.
Now only '''m0, m8 & m9''' are still in SL6 and can submit jobs to the old PBS/QSUB qcluster.


=== Job submission ===
=== Job submission ===

Latest revision as of 11:23, 14 February 2022

UI to use

Now only m0, m8 & m9 are still in SL6 and can submit jobs to the old PBS/QSUB qcluster.

Job submission

To submit a job, you just have to use the qsub command :

qsub myjob.sh

OPTIONS

  • -q queueName : choose the queue you want [mandatory]
  • -N jobName : name of the job
  • -I : (capital i) pass in interactive mode
  • -m mailaddress : set mail address (use in conjonction with -m) : MUST be @ulb.ac.be or @vub.ac.be
  • -m [a|b|e] : send mail on job status change (a = aborted , b = begin, e = end)
  • -l : resources options
For instance, if you want to use 2 cores: -lnodes=1:ppn=2


If you want to send more than 2500 jobs to the cluster, write all qsub commands in a text file, and use the script big-submission (more info here).


If you use MadGraph, read this section first or you risk crashing the cluster.


If you want to use the GPUs, please read here.

Job management

To see all jobs (running / queued), you can use the qstat command, or go to the JobView page to have a summary of what's running.

qstat

OPTIONS

  • -u username : list only jobs submitted by username
  • -n : show nodes where jobs are running
  • -q : show the job repartition on queues


Job Statistics

All the log files from the batch system are synced every 30 minutes in:

/group/log/torque/

A simple script to analyze the logs and provide some statistics for the user is provided:

torque-user-info.py

Just execute it as is (it is in your $PATH, so executable from everywhere). It will print information like the following:

ID: 6077555  ExCode:   0 Mem:    0M cpuT:      0s wallT:      3s eff:  0.0%   STDIN
ID: 6077602  ExCode:   0 Mem:   50M cpuT:      0s wallT:      2s eff:  0.0%   STDIN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
   user[G]	# Jobs	 <MEM> +- RMS        #HiMem  MAX Mem      <CPU time>    <walltime>    <Eff>   % WT/WT_TOT    # Jobs with Error code (% of user job)
----------------------------------------------------------------------------------------------------------------------------------------------------------------
    rougny[l]	    12	    13 +- 22    MB      0      52 MB  |  00:00:00:00  00:00:00:24  ( 0.0%) (-1.00% of tot) | # EC: 


If you want to test the batch system, you can follow the workbook here

Job Deletion

Use the following command:

qdel <JOBID>

To delete all your jobs, be patient while using the following line:

for j in $(qselect -u $USER);do timeout 3 qdel -a $j;done