PBS: Difference between revisions
No edit summary |
|||
Line 1: | Line 1: | ||
=== UI to use === | === UI to use === | ||
Now only ''' | Now only '''m0, m8 & m9''' are still in SL6 and can submit jobs to the old PBS/QSUB qcluster. | ||
=== Job submission === | === Job submission === |
Latest revision as of 11:23, 14 February 2022
UI to use
Now only m0, m8 & m9 are still in SL6 and can submit jobs to the old PBS/QSUB qcluster.
Job submission
To submit a job, you just have to use the qsub command :
qsub myjob.sh
OPTIONS
- -q queueName : choose the queue you want [mandatory]
- -N jobName : name of the job
- -I : (capital i) pass in interactive mode
- -m mailaddress : set mail address (use in conjonction with -m) : MUST be @ulb.ac.be or @vub.ac.be
- -m [a|b|e] : send mail on job status change (a = aborted , b = begin, e = end)
- -l : resources options
- For instance, if you want to use 2 cores: -lnodes=1:ppn=2
If you want to send more than 2500 jobs to the cluster, write all qsub commands in a text file, and use the script big-submission (more info here).
If you use MadGraph, read this section first or you risk crashing the cluster.
If you want to use the GPUs, please read here.
Job management
To see all jobs (running / queued), you can use the qstat command, or go to the JobView page to have a summary of what's running.
qstat
OPTIONS
- -u username : list only jobs submitted by username
- -n : show nodes where jobs are running
- -q : show the job repartition on queues
Job Statistics
All the log files from the batch system are synced every 30 minutes in:
/group/log/torque/
A simple script to analyze the logs and provide some statistics for the user is provided:
torque-user-info.py
Just execute it as is (it is in your $PATH, so executable from everywhere). It will print information like the following:
ID: 6077555 ExCode: 0 Mem: 0M cpuT: 0s wallT: 3s eff: 0.0% STDIN ID: 6077602 ExCode: 0 Mem: 50M cpuT: 0s wallT: 2s eff: 0.0% STDIN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- user[G] # Jobs <MEM> +- RMS #HiMem MAX Mem <CPU time> <walltime> <Eff> % WT/WT_TOT # Jobs with Error code (% of user job) ---------------------------------------------------------------------------------------------------------------------------------------------------------------- rougny[l] 12 13 +- 22 MB 0 52 MB | 00:00:00:00 00:00:00:24 ( 0.0%) (-1.00% of tot) | # EC:
If you want to test the batch system, you can follow the workbook here
Job Deletion
Use the following command:
qdel <JOBID>
To delete all your jobs, be patient while using the following line:
for j in $(qselect -u $USER);do timeout 3 qdel -a $j;done