LocalSubmission: Difference between revisions

From T2B Wiki
Jump to navigation Jump to search
No edit summary
 
(13 intermediate revisions by 5 users not shown)
Line 1: Line 1:


== Direct submission to local queue on the T2_BE_IIHE cluster ==
== Direct submission to local queue on the T2_BE_IIHE cluster ==
[[PageOutline]]
 
You first need to have read the [[Cluster_Presentation]]. Only then should you try to follow this page.
 
=== Aim ===
=== Aim ===


*The aim of this page is to provide a brief introduction how to submit to the localqueue.
*The aim of this page is to provide a brief introduction/workbook on how to submit to the local cluster.
*The localqueue allows to send executable code to the Tier2 cluster.
*The batch system allows to send executable code to the T2B cluster.
*This procedure can be used to run non-CMSSW code that need access to files on the Storage Element (SE) maite.iihe.ac.be.
*This procedure can be used to run any code, even those needing access to files on the Storage Element (SE) maite.iihe.ac.be or directly through /pnfs.
*It is useful to use this procedure to not overload the User Interfaces (UIs) known as the mX machines.
*It is useful to use this procedure to not overload the User Interfaces (UIs, = mX machines).


=== Procedure ===
=== Procedure ===


*Log in to a UI mX.iihe.ac.be; replace X with a number of choice. See [[policies]] about the policies on the UIs.
*Log in to an UI (m1, m8, m9)
ssh m1.iihe.ac.be
 
*Make a directory and prepare an executable.
*Make a directory and prepare an executable.
<pre>
mkdir directsubmissiontest
mkdir directsubmissiontest
cd directsubmissiontest/
cd directsubmissiontest/
emacs script.sh&
emacs script.sh&
 
</pre>
*Paste following code into script.sh. (see '''Attachemnt''' section below)
*Paste following code into script.sh. (see below)
 
*Due to the setup of the Tier2 the output of the script will be placed on the ''/localgrid'' partition which is mounted on both the UI's on the workernodes. Therefore you need to prepare a directory to make sure the output is stored correctly. The localgrid partition can be used as a sandbox for temporary placing input and output files. Do not store any files there permanently.
<pre>
mkdir /localgrid/$USER/directsubmissiontest
</pre>
*Execute the following command to submit the script to the local queue
*Execute the following command to submit the script to the local queue
<pre>
qsub -o script.stdout -e script.stderr script.sh
qsub -q localgrid@cream02 -o script.stdout -e script.stderr script.sh
 
</pre>
*Follow the progress of your job on the UI
*Follow the progress of your job on the UI
<pre>
qstat -u $USER
qstat -u $USER localgrid@cream02
 
</pre>
*Your job are finished if you don't see it anymore with qstat. You should now be able to find your output files in the directory you've created:
*Or alternatively follow the progress of your job using this web interface
> ls /user/$USER/directsubmissiontest
**http://mon.iihe.ac.be/ganglia/addons/job_monarch/?c=Servers
    script.stdout script.stderr
*Your job finished if you don't see it anymore with qstat. You should now be able to find your output files in the directory you've create on localgrid
 
<pre>
=== More details: some comments and FAQ ===
/localgrid/$USER/directsubmissiontest/script.stdout
/localgrid/$USER/directsubmissiontest/script.stderr
/localgrid/$USER/directsubmissiontest/
</pre>


=== Comments and FAQ ===
==== Comments ====
*In case you would like to access a root file you should copy it to the /scratch space on the workernode.
*In case you would like to access a root file you should copy it to the '''$TMPDIR''' (=/scratch/jobid.cream02.ac.be/) space on the workernode unique to each job.
**/scratch is the native disk of the workernode and is several 100 GBs big.
**/scratch is the native disk of the workernode and is several 100 GBs big.
**Each job is allotted a working directory that is cleaned automatically at the end of the job. This directory is store in the variable $TMPDIR
**Each job is allotted a working directory that is cleaned automatically at the end of the job. This directory is stored in the variable $TMPDIR
**Your procedure should look like this:
**Do not read root files from /user. This directory is not physically located on the workernode, it is mounted from the fileserver. Doing this will put a big load on the fileserver potentially causing the UIs to be slow.
**copy the necessary root from /localgrid (if you have any) to $TMPDIR
 
**Make sure the output of the job is also written to $TMPDIR 
**Copy your output files back to /localgrid
**Do not read root files from /localgrid. This directory is not physically located on the workernode, it is mounted from the fileserver. Doing this will put a big load on the fileserver potentially causing the UIs to be slow.


'''****** IMPORTANT *******'''<br>
'''****** IMPORTANT *******'''<br>
If you use the local submission, please notice that you potentially can slow down our site. So please, copy all the files you will use during the job to /scratch to avoid this. <br>
If you use the local submission, please notice that you potentially can slow down our site. So please, copy all the files you will use during the job to $TMPDIR to avoid this.
Many thanks, <br>
dccp dcap://maite.iihe.ac.be/pnfs/iihe/..../MYFILE $TMPDIR/
The Admin Team


*How to set CMSSW environment in a batch job?
==== FAQ ====
===== How to set CMSSW environment in a batch job =====


Add the following lines to your script :
Add the following lines to your script :
Line 61: Line 53:
pwd=$PWD
pwd=$PWD
source $VO_CMS_SW_DIR/cmsset_default.sh                          # make scram available                                                                                                                                                             
source $VO_CMS_SW_DIR/cmsset_default.sh                          # make scram available                                                                                                                                                             
cd /localgrid/<USER NAME>/path/to/CMSSW_4_1_4/src/               # your local CMSSW release                                                                                                                                                         
cd /user/$USER/path/to/CMSSW_X_Y_Z/src/                         # your local CMSSW release                                                                                                                                                         
eval <tt>scram runtime -sh</tt>                                         # don't use cmsenv, won't work on batch                                                                                                                                             
eval `scram runtime -sh`                                         # don't use cmsenv, won't work on batch                                                                                                                                             
cd $pwd
cd $pwd
</pre>
</pre>


*How to make your proxy available during batch jobs?
===== How to make your proxy available during batch jobs (for instance to write to /pnfs) =====


Make sure you have a valid proxy and copy it to some place on /localgrid :
* Create a proxy with long validity time:
<pre>
voms-proxy-init --voms MYEXPERIMENT --valid 192:0
cp $X509_USER_PROXY /localgrid/<USER NAME>/some/place
    MYEXPERIMENT is one of cms, icecube, solidexperiment.org, beapps, ...
</pre>


Add the following line to your script :
* Copy it to /user
<pre>
cp $X509_USER_PROXY /user/$USER/
export X509_USER_PROXY=/localgrid/<USER NAME>/some/place
</pre>


*How to avoid my short jobs from being blocked in the waiting queue when the site is full ?
* In your sh script you send to qsub, add the line:
export X509_USER_PROXY=/user/$USER/x509up_u$(id -u $USER)    # Or the name of the proxy you copied before if you changed the name


If you intend to submit short jobs, then it is wise to specify explicitly to the batch system their estimated maximum walltime. You can do this by adding an option to the qsub command :
* Then technically, to copy a file made in your job in the /scratch area, you just do:
  <pre>
  gfal-copy file://$TMPDIR/MYFILE srm://maite.iihe.ac.be:8443/pnfs/iihe/MY/DIR/MYFILE
qsub -q localgrid@cream02 -o script.stdout -e script.stderr -l walltime=<HH:MM:SS> script.sh
</pre>
or by adding the following line at the beginning of your job script :
<pre>
#PBS -l walltime=<HH:MM:SS>
</pre>
Proceeding this way, your jobs priority will grow faster as time goes by, increasing the chances of being executed first. (The shorter they are, the faster their priority will increase over the time.)
 
But be aware that if your jobs are running longer then the specified maximum walltime, they will be killed by the batch system. So, don't hesitate to overestimate a bit this maximum walltime.


=== Stop your jobs ===
=== Stop your jobs ===


If for some reason, you want to stop your jobs on the server, you can use this procedure:
If for some reason, you want to stop your jobs on the server, you can use this procedure:
qstat -u $USER


<pre>
qstat @cream02 | grep <your user name>
</pre>
This will give you a list of jobs running with thier ID's. f.i.
This will give you a list of jobs running with thier ID's. f.i.


<pre>
394402.cream02            submit.sh        odevroed              0 R localgrid
394402.cream02            submit.sh        odevroed              0 R localgrid
 
</pre>
Now, use the ID to kill the job with the qdel command:
Now, use the ID to kill the job with the qdel command:
<pre>
qdel 394402.cream02
qdel 394402.cream02
</pre>


Your job will now be removed.
Your job will now be removed.
Line 129: Line 104:
echo ">> script.sh is listing files and directories in userdir on storage element"
echo ">> script.sh is listing files and directories in userdir on storage element"
ls -l /pnfs/iihe/cms/store/user/$USER
ls -l /pnfs/iihe/cms/store/user/$USER
##When accessing files on the storage element it is important to execute your code on the /scratch partition of the workernode you are running on. Therefore you need to copy your executable which is accessing/writing root files onto the /scratch partition and execute it there. This is illustrated below.


echo ">> go to TMPDIR"
echo ">> go to TMPDIR"
Line 143: Line 116:
   //MyFile->ls();
   //MyFile->ls();
   //MyFile->Close(),
   //MyFile->Close(),
   TFile* f=TFile::Open(\"dcap://maite.iihe.ac.be:/pnfs/iihe/cms/store/user/$USER/testfile.root\");
   TFile* f=TFile::Open(\"dcap://maite.iihe.ac.be/pnfs/iihe/cms/store/user/$USER/testfile.root\");
   f->ls();
   f->ls();
   f->Close();
   f->Close();
Line 151: Line 124:
cat rootScript.C
cat rootScript.C


echo ">> set root"
##Copied a root version from /user/cmssoft into /localgrid
export ROOTSYS=/localgrid/$USER/cmssoft/root_5.26.00e_iihe_default_dcap/root
export PATH=$PATH:$ROOTSYS/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ROOTSYS/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/lib


echo ">> execute root macro"
echo ">> execute root macro"
Line 164: Line 131:
ls -l
ls -l


echo "copy the file back to the /localgrid sandbox"
#cp testfile.root /localgrid/jmmaes/directsubmissiontest
</pre>
</pre>
{{TracNotice|{{PAGENAME}}}}

Latest revision as of 09:30, 14 February 2022

Direct submission to local queue on the T2_BE_IIHE cluster

You first need to have read the Cluster_Presentation. Only then should you try to follow this page.

Aim

  • The aim of this page is to provide a brief introduction/workbook on how to submit to the local cluster.
  • The batch system allows to send executable code to the T2B cluster.
  • This procedure can be used to run any code, even those needing access to files on the Storage Element (SE) maite.iihe.ac.be or directly through /pnfs.
  • It is useful to use this procedure to not overload the User Interfaces (UIs, = mX machines).

Procedure

  • Log in to an UI (m1, m8, m9)
ssh m1.iihe.ac.be
  • Make a directory and prepare an executable.
mkdir directsubmissiontest
cd directsubmissiontest/
emacs script.sh&
  • Paste following code into script.sh. (see Attachemnt section below)
  • Execute the following command to submit the script to the local queue
qsub -o script.stdout -e script.stderr script.sh
  • Follow the progress of your job on the UI
qstat -u $USER
  • Your job are finished if you don't see it anymore with qstat. You should now be able to find your output files in the directory you've created:
> ls /user/$USER/directsubmissiontest
    script.stdout script.stderr

More details: some comments and FAQ

Comments

  • In case you would like to access a root file you should copy it to the $TMPDIR (=/scratch/jobid.cream02.ac.be/) space on the workernode unique to each job.
    • /scratch is the native disk of the workernode and is several 100 GBs big.
    • Each job is allotted a working directory that is cleaned automatically at the end of the job. This directory is stored in the variable $TMPDIR
    • Do not read root files from /user. This directory is not physically located on the workernode, it is mounted from the fileserver. Doing this will put a big load on the fileserver potentially causing the UIs to be slow.


****** IMPORTANT *******
If you use the local submission, please notice that you potentially can slow down our site. So please, copy all the files you will use during the job to $TMPDIR to avoid this.

dccp dcap://maite.iihe.ac.be/pnfs/iihe/..../MYFILE $TMPDIR/

FAQ

How to set CMSSW environment in a batch job

Add the following lines to your script :

pwd=$PWD
source $VO_CMS_SW_DIR/cmsset_default.sh                          # make scram available                                                                                                                                                             
cd /user/$USER/path/to/CMSSW_X_Y_Z/src/                          # your local CMSSW release                                                                                                                                                         
eval `scram runtime -sh`                                         # don't use cmsenv, won't work on batch                                                                                                                                            
cd $pwd
How to make your proxy available during batch jobs (for instance to write to /pnfs)
  • Create a proxy with long validity time:
voms-proxy-init --voms MYEXPERIMENT --valid 192:0
   MYEXPERIMENT is one of cms, icecube, solidexperiment.org, beapps, ...
  • Copy it to /user
cp $X509_USER_PROXY /user/$USER/
  • In your sh script you send to qsub, add the line:
export X509_USER_PROXY=/user/$USER/x509up_u$(id -u $USER)    # Or the name of the proxy you copied before if you changed the name
  • Then technically, to copy a file made in your job in the /scratch area, you just do:
gfal-copy file://$TMPDIR/MYFILE srm://maite.iihe.ac.be:8443/pnfs/iihe/MY/DIR/MYFILE

Stop your jobs

If for some reason, you want to stop your jobs on the server, you can use this procedure: qstat -u $USER

This will give you a list of jobs running with thier ID's. f.i.

394402.cream02            submit.sh        odevroed               0 R localgrid

Now, use the ID to kill the job with the qdel command: qdel 394402.cream02

Your job will now be removed.

Attachments

  • script.sh
#!/bin/bash          

##Some general shell commands
STR="Hello World!"
echo $STR    
echo ">> script.sh is checking where it is"
pwd
echo ">> script.sh is checking how much disk space is still available"
df -h
echo ">> script.sh is listing files and directories in the current location"
ls -l
echo ">> script.sh is listing files and directories in userdir on storage element"
ls -l /pnfs/iihe/cms/store/user/$USER

echo ">> go to TMPDIR"
cd $TMPDIR
echo ">> ls of TMPDIR partition"
ls -l

##Create a small root macro

echo "{
  //TFile *MyFile = new TFile(\"testfile.root\",\"RECREATE\"); 
  //MyFile->ls();
  //MyFile->Close(),
  TFile* f=TFile::Open(\"dcap://maite.iihe.ac.be/pnfs/iihe/cms/store/user/$USER/testfile.root\");
  f->ls();
  f->Close();
} 
" > rootScript.C

cat rootScript.C


echo ">> execute root macro"
root -q -l -b -n rootScript.C

echo ">> ls of TMPDIR"
ls -l