PBS
UI to use
Now only m0, m8 & m9 are still in SL6 and can submit jobs to the old PBS/QSUB qcluster.
Job submission
To submit a job, you just have to use the qsub command :
qsub myjob.sh
OPTIONS
- -q queueName : choose the queue you want [mandatory]
- -N jobName : name of the job
- -I : (capital i) pass in interactive mode
- -m mailaddress : set mail address (use in conjonction with -m) : MUST be @ulb.ac.be or @vub.ac.be
- -m [a|b|e] : send mail on job status change (a = aborted , b = begin, e = end)
- -l : resources options
- For instance, if you want to use 2 cores: -lnodes=1:ppn=2
data:image/s3,"s3://crabby-images/0d515/0d515c9863f1ab1773ac62fbf2b038141942e7d3" alt=""
If you want to send more than 2500 jobs to the cluster, write all qsub commands in a text file, and use the script big-submission (more info here).
data:image/s3,"s3://crabby-images/0d515/0d515c9863f1ab1773ac62fbf2b038141942e7d3" alt=""
If you use MadGraph, read this section first or you risk crashing the cluster.
data:image/s3,"s3://crabby-images/0d515/0d515c9863f1ab1773ac62fbf2b038141942e7d3" alt=""
If you want to use the GPUs, please read here.
Job management
To see all jobs (running / queued), you can use the qstat command, or go to the JobView page to have a summary of what's running.
qstat
OPTIONS
- -u username : list only jobs submitted by username
- -n : show nodes where jobs are running
- -q : show the job repartition on queues
Job Statistics
All the log files from the batch system are synced every 30 minutes in:
/group/log/torque/
A simple script to analyze the logs and provide some statistics for the user is provided:
torque-user-info.py
Just execute it as is (it is in your $PATH, so executable from everywhere). It will print information like the following:
ID: 6077555 ExCode: 0 Mem: 0M cpuT: 0s wallT: 3s eff: 0.0% STDIN ID: 6077602 ExCode: 0 Mem: 50M cpuT: 0s wallT: 2s eff: 0.0% STDIN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- user[G] # Jobs <MEM> +- RMS #HiMem MAX Mem <CPU time> <walltime> <Eff> % WT/WT_TOT # Jobs with Error code (% of user job) ---------------------------------------------------------------------------------------------------------------------------------------------------------------- rougny[l] 12 13 +- 22 MB 0 52 MB | 00:00:00:00 00:00:00:24 ( 0.0%) (-1.00% of tot) | # EC:
If you want to test the batch system, you can follow the workbook here
Job Deletion
Use the following command:
qdel <JOBID>
To delete all your jobs, be patient while using the following line:
for j in $(qselect -u $USER);do timeout 3 qdel -a $j;done