PBS TMPDIR
The PBS TMPDIR patch
When using shared NFS homedirectories on a large number of nodes, heavy load on the cluster leads to overload on the central NFS-server and bad performance. The approach of shared homedirectories is ofcourse very esay to setup and also necessary for MPI.
with thanks to David Groep:
Using the fact that TMPDIR points to a local (per-node) scratch directory, we have been applying a patch to the original pbs manager to select, based on the #nodfes requested by the job and the job type, the current working directory of the job. Patch is attached to this mail (it's just a few lines). What it does:
- if the job is of type "mpi", or if the type is "multiple" and the
number of requested nodes > 1, the bevahviour of the pbs job manager is un-altered.
- if the job type is "single", or the type is "multiple" and the
job requests 0 or 1 nodes, the following statement is inserted in the PBS job script, just before the user job is started:
[ x"$TMPDIR" != x"" ] && cd $TMPDIR
This patch is applied to the template for the pbs.pm job manager script in /opt/globus/setup/globus/pbs.in, which then gets translated on startup in /opt/globus/lib/perl/Globus/GRAM/JobManager/pbs.pm It has till now worked fine for all LCG jobs that at NIKHEF also go through the "old" pbs JM. The jobs don't notice the difference and we can use shared home dirs for all VOs, provided we also have a per-node $TMPDIR location on local disk.
The patch
- on the nodes, make sure there's sufficient amount of space in /var/spool/pbs/tmpdir (symlink to /scratch or overmount it)
- Check that you have a pbs-server version with the tmpdir patch. (eg, for LCG, torque-1.2.0p3-2 has it)
- /opt/globus/setup/globus/pbs.in (patch might not work, do it manually and don't forget to make a copy of the original first!):
*** pbs.in.orig 2005-05-20 12:56:32.000000000 +0200 --- pbs.in 2005-05-20 12:52:05.000000000 +0200 *************** *** 321,327 **** } print JOB "wait\n"; } ! elsif($description->jobtype() eq 'multiple') { my $count = $description->count; my $cmd_script_url ; --- 321,327 ---- } print JOB "wait\n"; } ! elsif( ($description->jobtype() eq 'multiple') and ($description->count > 1 ) ) { my $count = $description->count; my $cmd_script_url ; *************** *** 374,379 **** --- 374,393 ---- } else { + # this is a simple single-node job that can use $TMPDIR + # unless the user has given one explicitly + # refer back to JobManager.pm, but currently it seems that + # $self->make_scratchdir uses "gram_scratch_" as a component + if ( ( $description->directory() =~ /.*gram_scratch_.*/ ) and + ( $description->host_count() <= 1 ) and + ( $description->count <= 1 ) + ) { + print JOB '# user ended in a scratch directory, reset to TMPDIR'."\n"; + print JOB '[ x"$TMPDIR" != x"" ] && cd $TMPDIR'."\n"; + } else { + print JOB '# user requested this specific directory'."\n"; + } + print JOB $description->executable(), " $args <", $description->stdin(), "\n"; }
- From in the directory /opt/globus/setup/globus/, run ./setup-globus-job-manager-pbs (it will create a new pbs.pm)
- you might also add to print JOB '[ x"$TMPDIR" != x"" ] && cd $TMPDIR'."\n";
- export PBS_O_WORKDIR=$TMPDIR
- export SCRATCH_DIRECTORY=$TMPDIR
- to test, submit a job with /bin/pwd