Getting started with CMSSW on the Grid at IIHE

From T2B Wiki
Jump to navigation Jump to search

THIS PAGE IS OBSOLETE

Grid jobs in CMS are managed by the Crab software.

Get access to UI

  • valid UIs: master2.iihe.ac.be, master3.iihe.ac.be
  • get a user account from IIHE (send an email to support-iiheATvub.ac.be and explain them who you are and what you want to do)
    • if this gives problems, contact the IIHE grid site administrators grid_admin@listserv.vub.ac.be
  • direct password authentication is not allowed on the UIs for security reasons. You will need keypair authentication to gain access.
    • a valid keypair can be easily generated with ssh-keygen. This program will create a public and a private key. Needless to say that you should protect your private key with a strong passwd and that you do not share the private key with others nor use the same private key on different machines.
    • from the machine you use to connect run ssh-keygen -t rsa -b 2048
    • it first prompts for the location of the files. keep the default values unless you know what you are doing
    • then it will prompt for a password. this is the password used to encrypt your key. if you fill nothing and just press return, you will be able to use passwordless login to the machine. this is not very secure.
      • if you want to add or remove the password from your private key, read the examples section of man rsa
    • This generates 2 files ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub, of which ~/.ssh/id_rsa.pub is the public key.
    • login on a machine which has the /user dir mounted (eg lxpub2) and copy the PUBLIC key in your home directory on that machine. then do the following (copy&paste)
  mkdir ~/.ssh
  touch ~/.ssh/authorized_keys
  cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
  chmod 755 ~/.ssh
  chmod 755 ~/.ssh/authorized_keys
  • now try to connect from your machine to the UI (eg ssh master2)
    • it's possible that you need to relogin and/or wait 10 seconds for changes to take effect.
    • it's possible that your ssh client doesn't use ssh protocol 2 by default. if not, try to connect with eg "ssh -2 master2". if this works, you can make this the default option by adding in ~/.ssh/config the line
"Protocol 2"  

Setup the Crab environment

  • login to a machine where the Crab software is installed
 e.g: 
ssh -X <tt>whoami</tt>@master2.iihe.ac.be
  • setup the environment
    • source the CMS environment
  setenv VO_CMS_SW_DIR /msa3/cmssoft
  source $VO_CMS_SW_DIR/cmsset_default.csh ## initialize the cms env
    • source the Crab environment
  source /msa3/crab/latest/crab.csh
    • and prepare to run CMSSW (eg if you already have a project directory in ~/CMS_1_0_1)
  cd CMSSW_1_0_1/src; eval <tt>scramv1 runtime -csh</tt>

Prepare a Grid storage directory for the output of your job

This is no longer needed since CRAB jobs now use srmcp to copy files. Using srmcp has multiple benefits, one is the autocreation of directories.

Obsolete

  edg-gridftp-mkdir gsiftp://maite.iihe.ac.be/pnfs/iihe/becms/<tt>whoami</tt>/tutorial
  • check the presence of the directory
  edg-gridftp-ls gsiftp://maite.iihe.ac.be/pnfs/iihe/becms/<tt>whoami</tt>

Create and submit a simulation job with Crab

  • go to your working directory, where your CMSSW config file is
  cd IOMC/GeneratorInterface/test
This crab.cfg is prepared to run cmsRun pythiaHZZ4mu+analysis.cfg on one of the CMS Grid clusters. Edit crab.cfg and replace the storage_path with yours:
  storage_path = ...
  • create and submit a Crab job
  crab -create -submit 1
A crab_0_<date>_
  • check the status of the job
  crab -status
  • once the job is finished, get the standard error and output to your local directory, to check wether the job completion was fine
  crab -getoutput
  • You can now make an interactive analysis of the output. Open the output root file with the generated events:
  root dcap://maite.iihe.ac.be/pnfs/iihe/becms/<tt>whoami</tt>/tutorial/pythiaHZZ4mu_1.root

  • Once you know how to create and submit jobs, you might want to make a local Monte-Carlo data production. In order to make sure that different random seeds are used for each job you submit, your crab.cfg file should contain the following lines:
  [CMSSW]
  datasetpath=None
  pythia_seed = 12345
  vtx_seed = 98765
This will ensure that the random seeds used for the production will be changed for each job (the jobnr is added by crab).
  • If you want to include pile-up, a similar flag is not yet available. The change of this random seed is needed however, as its value will determine which minimum bias events will be piled up to the signal event. A way out is to execute a little script just before the job starts running on the site.
Add the following line to your crab.cfg:
   [USER]
   adaptscript = <tt>path to your crab directory</tt>/changePUseed.csh
were the changePUseed.csh script contains the following lines:
  #!/bin/tcsh
  mv pset.cfg pset.cfg_back
  set seed = 1234567
  @ seed += $1  
  cat pset.cfg_back | sed -e "s/int32 seed = 1234567/int32 seed = $seed/"  > pset.cfg
  echo "Have changed the PU seed parameter to $seed"

Run CMSSW interactively on a file stored on the IIHE Grid storage

From master2 it is possible to list files stored on the IIHE Grid storage, and to analyze them interactively. To list, just type as usual:

  ls -lt /pnfs/iihe/becms/<tt>whoami</tt>

To run CMSSW interactively on a file stored on the IIHE Grid storage, modify the PoolSource parameters in the CMSSW config file as follows, where whoami has to be replaced by your username:

  source = PoolSource {
    untracked vstring fileNames = {
      'dcap://maite.iihe.ac.be/pnfs/iihe/becms/<tt>whoami</tt>/tutorial/pythiaHZZ4mu_1.root'
    }
    untracked int32 maxEvents = 10
  }

Tip:

  • the TFC of IIHE has 2 standard paths so the full path shouldn't be typed fo this to work.
    • CMS: /store/... is used for CMS files transferred with Phedex. (It currently resolves to dcap://maite.iihe.ac.be/pnfs/iihe/cms/ph/sc4/store/...)
    • BECMS: /becms/... is used for BECMS files (ie local things). (It currently resolves to dcap://maite.iihe.ac.be/pnfs/iihe/becms/...)
Using this, the previous example becomes 
  source = PoolSource {
    untracked vstring fileNames = {
      '/becms/<tt>whoami</tt>/tutorial/pythiaHZZ4mu_1.root'
    }
    untracked int32 maxEvents = 10
  }

Analyse an unpublished, locally produced sample with crab

Suppose you produced some local MC sample with the method described above. A drawback of this method is that there is no possibility to publish this data in the DBS database. To run on these ROOT-files, configure your crab.cfg as follows:

Recent CMSSW versions

This is an update of the previous method tested on CMSSW_2_1_9 with CRAB2.3.0. To be able to run your local sample, you should add the following lines to your crab.cfg

  [CMSSW]
  datasetpath=None
  events_per_job = 10000 # less or equal to the number of events in one root-file
  number_of_jobs = 111 # equal to the number of root-files you want to run over
  [USER]
  script_exe = changeInputROOTfile.sh # script to change the input file before execution on the workernode
  [EDG]
  ce_white_list = iihe # to make shure your jobs run on the local t2 nodes

Your changeInputROOTfile.sh script should contain the following lines: (you should not make any changes to this script)

# Change the input file (${1} = job number)
cat pset.py | sed -e "s/PATLayer1_Output.fromAOD_PFJets_full_1.root/PATLayer1_Output.fromAOD_PFJets_full_${1}.root/"  > CMSSW.py

# Run your CMSSW config
cmsRun -e -j ${INITIAL_Working_Directory}/crab_fjr.xml CMSSW.py

In the CMSSW config which you put as pset in your crab.cfg, you should only add the first file of your sample in the PoolSource:

  source = PoolSource {
    untracked vstring fileNames = 
      {
         'dcap://maite.iihe.ac.be/pnfs/iihe/cms/store/user/mmaes/CMSSW219/TauolaTTBAR_WithPFJets/PATLayer1_Output.fromAOD_PFJets_full_1.root'
      }        
  }

You should ofcourse change PATLayer1_Output.fromAOD_PFJets_full to the name of your files in the script and in your PoolSource.


Older CMSSW versions

  [CMSSW]
  datasetpath=None
  ### Number of events to be processed per job
  events_per_job = 10000 # the same (or less) as the number of input ROOT-files
  ### Number of jobs
  number_of_jobs = 111 #the same (or less) as the number of input ROOT-files
  [USER]
  adaptscript = <tt>path to your crab directory</tt>/changeInputROOTfile.csh

where the 'changeInputROOTfile.csh' script contains the following lines:

 mv pset.cfg pset.cfg_back
 cat pset.cfg_back | sed -e "s/TopObjectsChowder_1.root/TopObjectsChowder_${1}.root/"  > pset.cfg

 echo "Have changed the input ROOT-file to TopObjectsChowder_${1}.root"

The first ROOT-file from the list you want to access, you should pass through your CMSSW.cfg.

E.g.:

  source = PoolSource {
    untracked vstring fileNames = 
      {
        "dcap://maite.iihe.ac.be/pnfs/iihe/becms/pvmulder/CSA07/TopObjectsChowder/TopObjectsChowder_1.root"
      }
  }

The 'changeInputROOTfile.csh' script adapts the string of the input ROOT-file in the PoolSource with the number of the crab-job. So that crab-job number 2 will run over the ROOT-file ending on TopObjectsChowder_2.root. Make sure that the script is adapted to the name of the ROOT-file you gave as input in your CMSSW.cfg! Note that you don't have to provide a full list of input ROOT-files. You only need to know the total number of ROOT-files on which you want to run and put this in your crab.cfg as number_of_jobs.


Analyse a remote, published sample

  • Find out the logical file name (or path) of the sample that you want to analyse at the CMS data discovery page
  • Download this file, rename it as crab.cfg.
    • Replace the datasetpath= by the path which appears on the CMS data discovery page.
    • Replace the pset= by the CMSSW config file which you wish to run.
    • Change the total_number_of_events=-1, events_per_job = 10 and number_of_jobs = 1 as described in the Crab manual.
  • Run your job(s) (here, 3 jobs)
  crab -create -submit 3


Template:TracNotice