Getting started with CMSSW on the Grid at IIHE
THIS PAGE IS OBSOLETE
Grid jobs in CMS are managed by the Crab software.
Get access to UI
- valid UIs: master2.iihe.ac.be, master3.iihe.ac.be
- get a user account from IIHE (send an email to support-iiheATvub.ac.be and explain them who you are and what you want to do)
- if this gives problems, contact the IIHE grid site administrators grid_admin@listserv.vub.ac.be
- direct password authentication is not allowed on the UIs for security reasons. You will need keypair authentication to gain access.
- a valid keypair can be easily generated with ssh-keygen. This program will create a public and a private key. Needless to say that you should protect your private key with a strong passwd and that you do not share the private key with others nor use the same private key on different machines.
- from the machine you use to connect run ssh-keygen -t rsa -b 2048
- it first prompts for the location of the files. keep the default values unless you know what you are doing
- then it will prompt for a password. this is the password used to encrypt your key. if you fill nothing and just press return, you will be able to use passwordless login to the machine. this is not very secure.
- if you want to add or remove the password from your private key, read the examples section of man rsa
- This generates 2 files ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub, of which ~/.ssh/id_rsa.pub is the public key.
- login on a machine which has the /user dir mounted (eg lxpub2) and copy the PUBLIC key in your home directory on that machine. then do the following (copy&paste)
mkdir ~/.ssh touch ~/.ssh/authorized_keys cat ~/id_rsa.pub >> ~/.ssh/authorized_keys chmod 755 ~/.ssh chmod 755 ~/.ssh/authorized_keys
- now try to connect from your machine to the UI (eg ssh master2)
- it's possible that you need to relogin and/or wait 10 seconds for changes to take effect.
- it's possible that your ssh client doesn't use ssh protocol 2 by default. if not, try to connect with eg "ssh -2 master2". if this works, you can make this the default option by adding in ~/.ssh/config the line
"Protocol 2"
Setup the Crab environment
- login to a machine where the Crab software is installed
e.g:
ssh -X <tt>whoami</tt>@master2.iihe.ac.be
- setup the environment
- source the CMS environment
setenv VO_CMS_SW_DIR /msa3/cmssoft source $VO_CMS_SW_DIR/cmsset_default.csh ## initialize the cms env
- source the Crab environment
source /msa3/crab/latest/crab.csh
- and prepare to run CMSSW (eg if you already have a project directory in ~/CMS_1_0_1)
cd CMSSW_1_0_1/src; eval <tt>scramv1 runtime -csh</tt>
Prepare a Grid storage directory for the output of your job
This is no longer needed since CRAB jobs now use srmcp to copy files. Using srmcp has multiple benefits, one is the autocreation of directories.
Obsolete
edg-gridftp-mkdir gsiftp://maite.iihe.ac.be/pnfs/iihe/becms/<tt>whoami</tt>/tutorial
- check the presence of the directory
edg-gridftp-ls gsiftp://maite.iihe.ac.be/pnfs/iihe/becms/<tt>whoami</tt>
Create and submit a simulation job with Crab
- go to your working directory, where your CMSSW config file is
cd IOMC/GeneratorInterface/test
- get a working Crab config file: download this example config file and rename it as crab.cfg
This crab.cfg is prepared to run cmsRun pythiaHZZ4mu+analysis.cfg on one of the CMS Grid clusters. Edit crab.cfg and replace the storage_path with yours:
storage_path = ...
- create and submit a Crab job
crab -create -submit 1
A crab_0_<date>_
- check the status of the job
crab -status
- once the job is finished, get the standard error and output to your local directory, to check wether the job completion was fine
crab -getoutput
- You can now make an interactive analysis of the output. Open the output root file with the generated events:
root dcap://maite.iihe.ac.be/pnfs/iihe/becms/<tt>whoami</tt>/tutorial/pythiaHZZ4mu_1.root
- Enjoy !
- At this stage you should learn more about Crab
- How to handle data on Grid storage
- Once you know how to create and submit jobs, you might want to make a local Monte-Carlo data production. In order to make sure that different random seeds are used for each job you submit, your crab.cfg file should contain the following lines:
[CMSSW] datasetpath=None pythia_seed = 12345 vtx_seed = 98765
This will ensure that the random seeds used for the production will be changed for each job (the jobnr is added by crab).
- If you want to include pile-up, a similar flag is not yet available. The change of this random seed is needed however, as its value will determine which minimum bias events will be piled up to the signal event. A way out is to execute a little script just before the job starts running on the site.
Add the following line to your crab.cfg:
[USER] adaptscript = <tt>path to your crab directory</tt>/changePUseed.csh
were the changePUseed.csh script contains the following lines:
#!/bin/tcsh mv pset.cfg pset.cfg_back set seed = 1234567 @ seed += $1 cat pset.cfg_back | sed -e "s/int32 seed = 1234567/int32 seed = $seed/" > pset.cfg echo "Have changed the PU seed parameter to $seed"
Run CMSSW interactively on a file stored on the IIHE Grid storage
From master2 it is possible to list files stored on the IIHE Grid storage, and to analyze them interactively. To list, just type as usual:
ls -lt /pnfs/iihe/becms/<tt>whoami</tt>
To run CMSSW interactively on a file stored on the IIHE Grid storage, modify the PoolSource parameters in the CMSSW config file as follows, where whoami has to be replaced by your username:
source = PoolSource { untracked vstring fileNames = { 'dcap://maite.iihe.ac.be/pnfs/iihe/becms/<tt>whoami</tt>/tutorial/pythiaHZZ4mu_1.root' } untracked int32 maxEvents = 10 }
Tip:
- the TFC of IIHE has 2 standard paths so the full path shouldn't be typed fo this to work.
- CMS: /store/... is used for CMS files transferred with Phedex. (It currently resolves to dcap://maite.iihe.ac.be/pnfs/iihe/cms/ph/sc4/store/...)
- BECMS: /becms/... is used for BECMS files (ie local things). (It currently resolves to dcap://maite.iihe.ac.be/pnfs/iihe/becms/...)
Using this, the previous example becomes
source = PoolSource { untracked vstring fileNames = { '/becms/<tt>whoami</tt>/tutorial/pythiaHZZ4mu_1.root' } untracked int32 maxEvents = 10 }
Analyse an unpublished, locally produced sample with crab
Suppose you produced some local MC sample with the method described above. A drawback of this method is that there is no possibility to publish this data in the DBS database. To run on these ROOT-files, configure your crab.cfg as follows:
Recent CMSSW versions
This is an update of the previous method tested on CMSSW_2_1_9 with CRAB2.3.0. To be able to run your local sample, you should add the following lines to your crab.cfg
[CMSSW] datasetpath=None events_per_job = 10000 # less or equal to the number of events in one root-file number_of_jobs = 111 # equal to the number of root-files you want to run over [USER] script_exe = changeInputROOTfile.sh # script to change the input file before execution on the workernode [EDG] ce_white_list = iihe # to make shure your jobs run on the local t2 nodes
Your changeInputROOTfile.sh script should contain the following lines: (you should not make any changes to this script)
# Change the input file (${1} = job number) cat pset.py | sed -e "s/PATLayer1_Output.fromAOD_PFJets_full_1.root/PATLayer1_Output.fromAOD_PFJets_full_${1}.root/" > CMSSW.py # Run your CMSSW config cmsRun -e -j ${INITIAL_Working_Directory}/crab_fjr.xml CMSSW.py
In the CMSSW config which you put as pset in your crab.cfg, you should only add the first file of your sample in the PoolSource:
source = PoolSource { untracked vstring fileNames = { 'dcap://maite.iihe.ac.be/pnfs/iihe/cms/store/user/mmaes/CMSSW219/TauolaTTBAR_WithPFJets/PATLayer1_Output.fromAOD_PFJets_full_1.root' } }
You should ofcourse change PATLayer1_Output.fromAOD_PFJets_full to the name of your files in the script and in your PoolSource.
Older CMSSW versions
[CMSSW] datasetpath=None ### Number of events to be processed per job events_per_job = 10000 # the same (or less) as the number of input ROOT-files ### Number of jobs number_of_jobs = 111 #the same (or less) as the number of input ROOT-files [USER] adaptscript = <tt>path to your crab directory</tt>/changeInputROOTfile.csh
where the 'changeInputROOTfile.csh' script contains the following lines:
mv pset.cfg pset.cfg_back cat pset.cfg_back | sed -e "s/TopObjectsChowder_1.root/TopObjectsChowder_${1}.root/" > pset.cfg echo "Have changed the input ROOT-file to TopObjectsChowder_${1}.root"
The first ROOT-file from the list you want to access, you should pass through your CMSSW.cfg.
E.g.:
source = PoolSource { untracked vstring fileNames = { "dcap://maite.iihe.ac.be/pnfs/iihe/becms/pvmulder/CSA07/TopObjectsChowder/TopObjectsChowder_1.root" } }
The 'changeInputROOTfile.csh' script adapts the string of the input ROOT-file in the PoolSource with the number of the crab-job. So that crab-job number 2 will run over the ROOT-file ending on TopObjectsChowder_2.root. Make sure that the script is adapted to the name of the ROOT-file you gave as input in your CMSSW.cfg! Note that you don't have to provide a full list of input ROOT-files. You only need to know the total number of ROOT-files on which you want to run and put this in your crab.cfg as number_of_jobs.
Analyse a remote, published sample
- Find out the logical file name (or path) of the sample that you want to analyse at the CMS data discovery page
- Download this file, rename it as crab.cfg.
- Replace the datasetpath= by the path which appears on the CMS data discovery page.
- Replace the pset= by the CMSSW config file which you wish to run.
- Change the total_number_of_events=-1, events_per_job = 10 and number_of_jobs = 1 as described in the Crab manual.
- Run your job(s) (here, 3 jobs)
crab -create -submit 3