AutomaticTopTreeProduction

From T2B Wiki
Revision as of 12:28, 26 August 2015 by Maintenance script (talk | contribs) (Created page with " <pre></pre> == The TopDB wiki == PageOutline This wiki page is intended to serve as comprehensive resource for those wishing to use TopDB. As TopDB is a extremely...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search



The TopDB wiki

PageOutline

This wiki page is intended to serve as comprehensive resource for those wishing to use TopDB. As TopDB is a extremely useful but complex tool, all users are strongly encouraged to become skilled in its maintenance, debugging and development.

TopTreeTools

The TopDB

TopDB is an online interface designed to primarily serve as a bookkeeping system for the PATtuples and TopTrees (the customised data format of the IIHE Top physics group) that are produced by the TopBrussels group. TopDB also functions as online interface to the AutomaticTopTreeProducer package which controls the production, storage and skimming of TopTrees.

The benefits of this system are numerous but the main advantages are:
      • Ease of submission of large numbers of crab jobs: typically a single crab configuration file is created for a given
     production round and is utilised by TopDB to automatically submit large numbers of crab jobs. 
      • Continous, web-based monitoring of crab tasks: TopDB provides monitoring of the crab tasks, and can send email alerts
     upon the commencement, completion, failure and stalling of these tasks.
      • Centralised data and MC production rounds, the bookeeping and automatic job submission features of TopDB allow
     large production campaigns to be undertaken. The idea is that practically all the needs of the IIHE Top physics group
     may be met by a single set of data and MC samples for a given period. This greatly decreases the amount of work needed to process data
     and reduced the time taken to produce meaningful physics results.       
      • Bookeeping: The detailed bookeeping and cross-referenicing in TopDB enables the retrieval of the details of the original sample from which
     a given PATTuple or TopTree was produced. 
      • Automated skimming: TopDB also provides an interface to the TopSkimmer package. This allows the production of skimmed TopTree via
     completion of a web form.

TopDB may be accessed here

This page is not open to the public. To obtain your personal login/password please send an email to michael.maes@SPAMNOTvub.ac.be.

The overall structure of TopDB

    • TopDB was built using the CakePHP developement framework, more information on CakePHP is available here. This framework builds web interaces using three compnents: models, views and controllers.
    • Model: every DataBase table is modeled in the website code (e.g.: dataset model, user model,...)
    • View: every section of TopDB is built by "views" which define what is shown: index,view,add,edit,delete
    • Controller: code that implements the actions in the view: add toptree, remove dataset,....

Authentication is required in order to access TopDB:

    • Database driven authentication system.
    • Login/password required for entering the page.
    • Login/password is fed from the Users section of TopDB
    • The Username(+group) authentication allows us now to separate "Users" and "Admins".
    • We can assign permissions based on groups for each controller.
    • Our permission setup:
    • 3 groups: Users and Admins and Production
      • Users: index,view permissions (e.g. they can look at the data) + add permission for DataSets/Patuples/TopTrees
      • Admins: full access to Users/Groups/Datasets/Patuples/Toptrees/Requests
      • Production: access to submit requests but access more limited than admins
    • Cross-referencing:
    • The toptrees, patuples and datasets tables are linked in the database so they are cross-referenced within the website. This allows easy navigation of available TopTrees for a given dataset and traceability of the parent dataset of a given TopTree.

How to use it (For the user)

    • If you open the TopDB webpage, there are four important components concerning the datasets:
      • Datasets: this module is used to store information about the RECO datasets we process. These entries are just for book-keeping only, it does not mean we store the samples at our local T2 site. If you open "List Datasets", you will get a table with all the samples present in the database. Behind each sample, you'll find some 'actions' and one of them is 'View' which will open a detailed view of this particular sample. In this view, the TopDB will show you all Patuples and TopTrees which are linked to this sample. (NOTE: The CMSSW version marked here is the general branch in which the RECO was performed, this might differ from the release used to produce PAT/TopTree)
      • Patuples: this module is used to store information about the produced PAT samples. If you open "List Patuples", you will get a table with all the PAT samples present in the database. Behind each sample, you'll find some 'actions' and one of them is 'View' which will open a detailed view of this particular PAT sample. In this view, the TopDB will show you all TopTrees which are linked to this PAT sample as well as the RECO sample it originates from. The samples in this table are all stored at T2_BE_IIHE and published to local DBS (cms_dbs_ph_analysis_02) with the name that is provided in the detailed view. A search function for pat samples is also available under "Search Patuples".
      • TopTrees this module is used to store information about the produced TopTrees. If you open "List TopTrees", you will get a table with all the TopTrees present in the database. Behind each sample, you'll find some 'actions' and one of them is 'View' which will open a detailed view of this particular PAT sample. This view will give you among other things, the version of the TopTreeproducer and the location of the TopTree. All these TopTrees are stored on the SE of T2_BE_IIHE (and the userdisks) but are also available for download trough the TopDB-TopSkimmer interface (see below) which you'll find under the 'Download' action.
      • Requests: in this section, you will see the samples that are currently being processed and the ones that are in the queue. (Only the administrators can add new requests. To have a sample produced, send a request to top-brussels)
    • The TopSkimmer interface in TopDB
      • To be able to access the TopTrees from outside our lab, we integrated the TopSkimmer in download method on the TopDB. As the name reveils, the TopDB will skim the toptree in question with your criteria and then provide you with a download link via email.
      • Go to the "List TopTrees" section and click on download next to the TopTree you want.
      • You will be redirected to a page where you need to fill in your email address and your skim preferences.
      • For more information about the Skimmer: http://w3.iihe.ac.be/Indico_IIHE/getFile.py/access?contribId=9&resId=0&materialId=slides&confId=134
      • If everything is set the way you want, click on the Skim TopTree button in the bottom of the page. Take care, you need to be sure that the sample you run over contains the branches you request in the Skimmer e.g. Only ask for GenEvent with a TTBar/TTJets sample or never ask for GenJets in a data-sample!
      • If everything goes well, you will receive an email with the download links. In general this will be more than one link because the TopTree production produces the TopTrees in "blocks" of 100k events (~1.5gb). Every block will be skimmed and available for download separately (you can hadd the skimmed files locally if you desire. NOTE: the hadd command should not be used on data toptrees because trigger information is affected (no trigger information is lost, but the way to retrieve the info would be different); instead the "standalone" skimmer should be used for merging.).
    • How to change your password
      • Go to the "List users" section on the TopDB.
      • You will get a list of all the TopDB users.
      • Behind your user there should appear a link 'Change password' in the 'Actions' list
      • Click on this link and enter a new password

How to use it (For the production team)

(Note: you'll need to be in the Admins group)

    • Whenever you want a certain dataset to be TopTree'd centrally you can file a request in TopDB.
    • Go to TopDB (link below)
    • If not present, add the requested dataset in the datasets section.
    • Once this is done, it will be possible to request a TopTree Production from the Requests section.
    • Choose "Request from RECO" (production RECO->PAT->TopTree) or "Request from PAT" (production PAT->TopTree)

NOTE: in the latter case, make sure the PATtuple contains all information needed to run the TopTreeProducer!

    • Fill in all required information in the form and assign a priority.

NOTE: make sure you choose the proper GlobalTag. Choosing a wrong one might cause the production to fail! Check the SWGuideFrontierConditions wiki.

    • Your request will become green once the script is handling it.
    • An email will be sent to top-brussels-datasets@cern.ch upon completion.
    • A skim of a sample can take a few hours, you can count +-15 minutes per toptree block. e.g. 9 blocks ~= 135 minutes. Also not that only 1 skim will run at a time, so you might end up with a higher waiting time as there can still be a running skim.

Location of the scripts

For bookkeeping, and if we ever want to move the scripts, this is a list of files that use the explicit location of AutoMaticTopTreeProducer(Git) (you can check for that using 'grep -r "AutoMaticTopTreeProducer" .').

    • TopDB
      • function getlog in /var/www/TopDB/app/controllers/requests_controller.php
      • function getlog in /var/www/TopDB/app/controllers/simrequests_controller.php
      • function getlog in /var/www/TopDB/app/controllers/removals_controller.php
      • functions getWorkFlowStatus and getNextPoll in /var/www/TopDB/app/app_controller.php
      • Symlink /var/www/TopDB/app/files -> /home/dhondt/TopSkim/Skimmed/
      • Symlink /home/dhondt/TopSkim/Skimmed/logs -> /home/dhondt/TopSkim/logs/
      • Symlink /home/dhondt/TopSkim/Skimmed/prodlogs -> /home/dhondt/AutoMaticTopTreeProducerGit/logs/
    • Scripts
      • AutoMaticTopTreeProducer(Git)/AutoMaticSIMProducer.py
      • AutoMaticTopTreeProducer(Git)/AutoMaticTopTreeProducer.py
      • AutoMaticTopTreeProducer(Git)/CleaningAgent_threaded.py
      • AutoMaticTopTreeProducer(Git)/PatProducer.py and TopTreeProducer.py-> Are these used?
      • AutoMaticTopTreeProducer(git)/tools/checkCMSSWversions.py
      • AutoMaticTopTreeProducer(git)/tools/removetool.py

TopDB development

    • Location: /var/www/TopDB
    • TopDB is developed using CakePHP
    • TopDB specific implementations (not complete):
      • Sanity plots are made automatically when viewing a toptree on the webpage, because in /var/www/TopDB/app/views/toptrees/view.ctp you see this triggers the execution of the script /var/www/TopDB/app/scripts/generateSanityCheckPlots.sh, and if data (and a PU JSON file is attached) also CalcPU.sh. The resulting plots are stored in /var/www/TopDB/app/webroot/img/validation-plots. If you remove the toptree-id directory here -which you can only completely do if you are root on mtop, 'dhondt' is not enough-, the next time you view the webpage of the toptree with this id, the sanity plots and PU histograms (for data) are recalculated.
      • In the getlog function in requests_controller.php you can see that the log files that can be viewed via the webpage are obtained from a symlink in /var/www/TopDB/app/files/prodlogs, pointing to /home/dhondt/AutoMaticTopTreeProducer(Git)/logs/, not the hardcoded AutoMaticTopTreeProducer path in the requests_controller.php file itself!
      • To adapt the available CMSSW versions in the dropdown box in the webpage for a toptree request (https://mtop.iihe.ac.be/TopDB/requests/add), you can either run the checkCMSSWversions.py script in AutoMaticTopTreeProducer(Git)/tools itself -but this may not work out of the box in the new Git settings), or you can insert a new release by hand in https://mtop.iihe.ac.be/phpmyadmin/ in the table 'cmsswversions'. For the simproduction, https://mtop.iihe.ac.be/TopDB/simrequests/add, the available CMSSW versions in the dropdown box are currently hardcoded in the CMSSWver_sim simrequests table entry in the database (e.g. enum('CMSSW_5_3_8_patch3', 'CMSSW_5_3_13')), so you need to adapt this by hand via https://mtop.iihe.ac.be/phpmyadmin/.

TopDB operation with multiple grid certificates

The toptree production scripts can choose in a uniform way a grid certificate to use for crab, among multiple available certificates. If users want to share their certificate to aid in the production, these are the steps to follow.

    • Export your grid certificate from your browser, e.g. for FireFox: go to Preferences->Advanced->Encryption->View Certificates, then select your certificate and click backup (choose the .p12 file format).
    • Install it on mtop (please substitute <USERNAME> with your username :)):
scp mycertificate.p12 dhondt@mtop.iihe.ac.be:~/.globus/.altcert/cert_<USERNAME>.p12

ssh dhondt@mtop.iihe.ac.be

cd .globus/.altcert

sh convcert.sh cert_<USERNAME>.p12 <USERNAME>
 IMPORTANT: this last action will first query you 2 times for the "import password" (the one you chose when backing up from your browser). Next it will ask you to choose a "PEM passphrase". Please use our common password which you can find by executing on dhondt@mtop:
grep GridPass /home//dhondt/AutoMaticTopTreeProducer/.config
    • Delete the obsolete p12 file:
 
rm -v cert_<USERNAME>.p12 

Now the new certificate is installed and will be put in production immediately.

How to perform the TopDB operation

The following describes all the actions a TopDB operator should be able to take to ensure successful production of samples.

Run the AutoMaticTopTreeProducer from the production account

General information

    • Central user : dhondt
    • Machine: mtop.iihe.ac.be
ssh dhondt@mtop.iihe.ac.be //you need permission to login on this account
    • People responsible for the central production system (at this point): everyone
    • People doing TopDB operation: everyone

Installing a CMSSW and TopTreeProducer version for TopDB production (for experienced shifters)

    • First of all check if the release is installed on the SW area.
    • if not:
      • If the software area of the T2 is used, contact the CMSSW-deployment team
      • If the local SW area is used, you can install it yourself
ssh cmss@mtop.iihe.ac.be

source SourceMePlease_Really_SOURCE_ME.sh (if it's a release >= CMSSW_4_1_0)
else
SourceMePlease_Really_SOURCE_ME_32BIT.sh

apt-get update

apt-get install cms+cmssw+CMSSW_1_2_3

*OR*

apt-get install cms+cmssw-patch+CMSSW_1_2_3_patch4

    • Now check out the proper release. RESPECT NAMING CONVENTION!!! -> e.g.: CMSSW_4_1_3_TopTreeProducer_41X_v1 and NOT CMSSW_4_1_3
/* setup your cvs */

export CVS_RSH=/usr/bin/ssh
export CVSROOT=:ext:<CERNUSERNAME>@cmscvs.cern.ch:/cvs_server/repositories/CMSSW

cd $HOME/AutoMaticTopTreeProducer/

scram p -n CMSSW_4_1_3_TopTreeProducer_41X_v1 CMSSW CMSSW_4_1_3 // ofcourse change the versions;-)

cd CMSSW_4_1_3_TopTreeProducer_41X_v1/src

cmsenv

addpkg PhysicsTools/PatAlgos (check the PAT recipes if a special tag should be checked out)

Check the recipes for additional tags

cvs co -r CMSSW_41X_v1 -d TopBrussels/TopTreeProducer UserCode/TopBrussels/TopTreeProducer // you might want to update the version

scramv1 b -j 8

IMPORTANT: Compile libToto!!!!!!!

cd TopBrussels/TopTreeProducer/src

make

cp -vf *.so ../tools

    • Now exit your ssh connection -> start from a clean environment! (IMPORTANT)
    • Synchronize the installed CMSSW versions list with TopDB
cd $HOME/AutoMaticTopTreeProducer/tools
python checkCMSSWversions.py

Installing and hacking a new CRAB release on mtop (for experienced shifters)

    • These instructions are valid for CRAB >= CRAB_2_8_4_patch2.
    • This explains how to install a new version of CRAB on mtop and remove the > 500 jobs must go to CrabServer requirement.
    • WARNING: python is sensitive to indentation, please take care of indenting properly when applying the hacks
    • First enter the software area on mtop and download the CRAB tarball
ssh cmss@mtop.iihe.ac.be

cd /jefmount_mnt/jefmount/cmss/CRAB/

wget --no-check-certificate http://cmsdoc.cern.ch/cms/ccs/wm/www/Crab/Docs/CRAB_X_Y_Z.tgz

mkdir tmp

cd tmp

tar xzvf ../CRAB_X_Y_Z.tgz

mv CRAB_x_y_z ../CRAB_x_y_z_mtop

cd ../CRAB_x_y_z_mtop

./configure

cd ../

rm -rfv tmp

    • OUTDATED: Now we have to configure the CRAB release and remove this > 500 jobs only crabserver limit

cd CRAB_X_Y_Z_mtop

** update python/Submitter.py **

change the lines:

if self.limitJobs and len(self.nj_list) > 500:
            msg = "The CRAB client will not submit task with more than 500 jobs.\n"

to:

if self.limitJobs and len(self.nj_list) > 5000:
            msg = "The CRAB client will not submit task with more than 5000 jobs.\n"

** update python/cms_cmssw.py **

change the lines:

       if not self.server and not self.local :
            if common.scheduler.name().upper() == 'REMOTEGLIDEIN' :
                if njobs > 5000 :
                    raise CrabException("too many jobs. remoteGlidein has a limit at 5000")
            else :
                if njobs > 500:
                    msg =  "Error: this task contains more than 500 jobs. \n"
                    msg += "     The CRAB SA does not submit more than 500 jobs.\n"
                    msg += "     Use the server mode. \n"
                    raise CrabException(msg)
        if self.server and njobs > 5000 :
            raise CrabException("too many jobs. CrabServer has a limit at 5000")

to :

       if not self.server and not self.local :
            if common.scheduler.name().upper() == 'REMOTEGLIDEIN' :
                if njobs > 5000 :
                    raise CrabException("too many jobs. remoteGlidein has a limit at 5000")
            else :
                if njobs > 5000:
                    msg =  "Error: this task contains more than 5000 jobs. \n"
                    msg += "     The CRAB SA does not submit more than 5000 jobs.\n"
                    msg += "     Use the server mode. \n"
                    raise CrabException(msg)
        if self.server and njobs > 5000 :
            raise CrabException("too many jobs. CrabServer has a limit at 5000")


    • OUTDATED: Now we have to configure the CRAB release to remove the minimal job duration (easier for test jobs)

** update python/Scheduler.py **

change: 

        txt += '    let "MIN_JOB_DURATION = 60*%d" \n'%self.minimal_job_duration

to:

        txt += '    let "MIN_JOB_DURATION = 0*%d" \n'%self.minimal_job_duration

    • OUTDATED: Now we have to configure the CRAB to allow us swapping grid proxy's

** update python/SchedulerGrid.py **

1) at the top of function wsCopyOutput

change:

        txt += '#\n'
        txt += '# COPY OUTPUT FILE TO SE\n'
        txt += '#\n\n'

        if int(self.copy_data) == 1:

to:

        txt += '#\n'
        txt += '# COPY OUTPUT FILE TO SE\n'
        txt += '#\n\n'

        txt += 'swappedproxy="0"\n'; 
        txt += 'if [ -f "$SOFTWARE_DIR/tmpfile" ];then\n'
        txt += '  echo "*************** SWAPPING PROXY FOR STAGEOUT!! ***************"\n'
        txt += '  voms-proxy-info --all\n'
        txt += '  vdir $X509_USER_PROXY\n'
        txt += '  chmod +w $X509_USER_PROXY \n'
        txt += '  cp -v $SOFTWARE_DIR/tmpfile $X509_USER_PROXY\n'
        txt += '  chmod 400 $X509_USER_PROXY \n'
        txt += '  vdir $X509_USER_PROXY\n'
        txt += '  voms-proxy-info --all\n'
        txt += '  swappedproxy="1"\n'

        txt += 'fi\n'
        txt += 'export swappedproxy\n'

        if int(self.copy_data) == 1:

2) at the bottom of function wsCopyOutput

change: 

        else:
            # set stageout timing to a fake value
            txt += 'export TIME_STAGEOUT=-1 \n'        
        return txt

to:

        else:
            # set stageout timing to a fake value
            txt += 'export TIME_STAGEOUT=-1 \n'

        txt += 'if [ $swappedproxy == "1" ]; then\n'
        txt += '   echo "--> Checking correctness of proxy after cmscp"\n'
        txt += '   voms-proxy-info --all\n'
        txt += 'fi\n'

        # the following is to make the jobs finish at T2_BE_IIHE, otherwise they would be constantly aborted.
        txt += 'vdir $X509_USER_PROXY\n'
        txt += 'chmod 600 $X509_USER_PROXY \n'
        txt += 'vdir $X509_USER_PROXY\n'

        
        return txt

    • Now you have to test this new crab release using a test task
  cd /home/dhondt/AutoMaticTopTreeProducer/CMSSW_5_3_5_TopTreeProducer_53X_v3/src/ConfigurationFiles/test 

  (or create a new one based on a toptree production working dir, make sure this task was submitted with an alternative proxy)

  - update crab_xxxxxx.cfg
    - change stageout folder name to TestCRABXYZ
  
  setDefaultProxy
  cp $X509_USER_PROXY tmpfile 

  source setGridProxyForManualCrab

  source /jefmount_mnt/jefmount/cmss/CRAB/CRAB_X_Y_Z_mtop/crab.sh

  setcms2

  cmsenv

  crab -create -cfg crab_xxxxxxx.cfg

  crab -submit

    • Now wait for the jobs to finish and check if they reach "Done", retrieve the output and check if the files are on storage and they contain the proper branches.
    • Once you confirm that the CRAB release works and it is not changed in a way to harm the TopTreeProduction Workflow, tag it as "latest" so the CrabHandler.py of the TopTreeProduction tools picks it up
cd /jefmount_mnt/jefmount/cmss/CRAB/

rm -v topdb-latest

ln -s CRAB_X_Y_Z_mtop topdb-latest

Updating the CA certificates

At some point the CA certificates on mtop might be outdated, resulting in problems while creating crab jobs (e.g. 'SSL3_GET_SERVER_CERTIFICATE: certificate verify failed'). You can check the versions of the certificates (e.g. 1.58-1) by logging in as root on mtop and doing

rpm -qa | grep ca 

Go to /etc/yum.repos.d/ and do

yum clean all
yum update ca-policy-egi-core

to update the CA certificates.

Operating the production script

    • Make sure that StartProduction_threaded.py is running continuously on this account.
    • If it is not running, start it in the following environment (**ESSENTIAL for DBS3!**) ->
            • CMSSW_5_3_X setup
            • crab setup
            • proxy setup: voms-proxy-init
    • Explicit instructions:
exit (if previously logged in)

ssh dhondt@mtop.iihe.ac.be

cd _ANY_CMSSW_5_3_AREA

cmsenv

setcrab

voms-proxy-init --voms cms:/cms/becms --valid 190:00

cd $HOME/AutoMaticTopTreeProducerGit

python StartProduction_threaded.py >> logs/ProductionWorkflow-ERROR.txt 2>&1 &

exit (to make sure it keeps running in the background)

    • Production is started once a sample is added to the requests section of TopDB.
    • The script will run inside N threads at this moment. Can be trivially extended/reduced by setting the "nWorkers" variable in $HOME/AutoMaticTopTreeProducer/.config
    • Every hour, the script will reload this config.
    • Note: it is possible that the StartProduction_threaded.py still seems to be running on mtop but that the production workflow actually has crashed (for example, if the toptree request page on the TopDB says it is crashed, and the ProductionWorkflow.txt is not updated anymore). In this case, first kill StartProduction_threaded.py (kill -9 <processid>), and then follow the steps above to restart the script.
    • Note: the proxy will expire after one week, and it is possible that the communication with DBS doesn't work anymore (toptree request fails when checking if dataset exists). The temporary solution is to kill and restart the StartProduction_threaded.py script according to the instructions above, hence making a new proxy.

Injecting a batch of samples

    • If you have a long list of samples to be produced, you can add them using a small tool which will also add the production request.
    • Tool location: $HOME/AutoMaticTopTreeProducer/tools/addDataSets.py
usage: addDataSets.py [options]

options:
  -h, --help            show this help message and exit
  --dataset_file=FILE   File containing dataset info
  --search_dbs=SEARCHDBS Get Dataset info from dbs (query comes as argument)

    • This tool can read from a file which contains 1 sample per line with the following syntax
SampleName:Process:CrossSection:CMSSW_Sample:CMSSW_Prod:GlobalTag:Priority

SampleName -> What you find in DBS

DataTier -> GEN-SIM-RECO/RECO/AODSIM/AOD

Process -> These are fixed names you can find in the add dataset section of the topdb

CrossSection -> Just a value which is used to represent the integrated lumi of the sample (still no clear consensus on which XS to put)

CMSSW_Sample -> The CMSSW version that was used to do the RECO/AOD, not the version we are going to use per se. Should be something like CMSSW_311X, CMSSW_38X (no specific version!!!)

CMSSW_Prod -> The exact name of the CMSSW version from the TopDB which needs to be used to create the TopTree. e.g.: CMSSW_4_1_3_TopTreeProducer_41X_v1

Priority -> Production priority, can range from 1 to whatever:-) usually it's enough to let it range upto 10.

DBSInstance -> provide a dbs instance for this dataset (global, phys03, ...)

    • Example of how such txt file would look like
/TToBLNu_TuneZ2_tW-channel_7TeV-madgraph/Spring11-PU_S1_START311_V1G1-v1/AODSIM:AODSIM:NewPhysics:1:CMSSW_311X:CMSSW_4_1_4_patch4_TopTreeProd_41X_v4:START41_V0:4
/bprime350_tWtW_Fall10MG7TeV_AlexisLHE/kfjack-bprime350_tWtW_Fall10MG7TeV_AlexisLHE_GEN_SIM_RECO_v1-654d567efbe29d8192394cca9e5e6c3f/USER:GEN-SIM-RECO:NewPhysics:1:CMSSW_311X:CMSSW_4_1_4_patch4_TopTreeProd_41X_v4:START41_V0:7:cms_dbs_ph_analysis_01

Following production workflows and debugging

    • When you submitted a number of workflows and you realize you forgot something and a lot of the requests are still queued, you must flush the requests using phpmyadmin
    • Requests can be followed on the TopDB website TopDB/requests
    • For the TopTree production jobs that are already running and something is going wrong, check the logs, they can be found in $HOME/AutoMaticTopTreeProducer/<CMSSW_a_production_tag_name>/src/ConfigurationFiles/<DataSetName>/<ProductionRound>/<TimeStamp>, e.g.
$HOME/AutoMaticTopTreeProducer/CMSSW_4_1_3_patch2_TopTreeProd_41X_v1_patch1/src/ConfigurationFiles/QCD_Pt-20_MuEnrichedPt-15_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/29032011_002845/
    • The logfiles in this directory start with "log_". Investigate these to figure out a possible problem.
    • The crab directory is also located here and starts is called something like TOPTREE_<ProductionRound>_<TimeStamp>. In this directory, the stdout and stderr files will end up. These can be investigated if there are many jobs failing
    • One can check if there are jobs failing by using the CMS dashboard, here
    • Crab jobs can be aborted by putting an empty file with name .abort inside the crab directory. e.g.
 touch $HOME/AutoMaticTopTreeProducer/CMSSW_4_1_3_patch2_TopTreeProd_41X_v1_patch1/src/ConfigurationFiles/QCD_Pt-20_MuEnrichedPt-15_TuneZ2_7TeV-pythia6/Spring11-PU_S1_START311_V1G1-v1/29032011_002845/TOPTREE_Spring11-PU_S1_START311_V1G1-v1_29032011_002845/.abort
    • If you see that submission is failing because the jobs are submitted to a site with problems, a .blacklist file can be added, containing a line with the name of the T2. For T2_ES_IFCA this would be for instance:
GridBlacklist T2_ES_IFCA
  • If there are some unfinished production jobs, that have been running a long time or stuck in the 'submitted' state you may need to check TopDB and debug.
  • In the case where some jobs are stuck in the 'submitted' state, jobs may be killed and resubmitted using manual crab commands. If the problem persists at a given site, consider black-listing the site before resubmission.
      • One of the particular problems that can be encountered is bad behavior of the crabserver. Check with myproxy-info -d that the current proxy works; if not, you probably have to change the GLOBUS_TCP_PORT_RANGE variable (something like export GLOBUS_TCP_PORT_RANGE=30000,35000 or GLOBUS_TCP_PORT_RANGE=20036,25000. When you find a range for myproxy-info -d that works, this range has to be changed in the $HOME/AutoMaticTopTreeProducer/CrabHandler.py script as well). This can solve failure of crab commands like forceResubmit.





Removing samples (for experienced shifters)

    • Please be careful with this tool, removal can NOT be undone!!! (Apart from reproducing the sample)
    • Tool located in $HOME/AutoMaticTopTreeProducer/tools/removetool.py -h
usage: removetool.py [options]

options:
  -h, --help            show this help message and exit
  --remove_dataset=RMDATASET
                        Use this flag to remove the given dataset (note:
                        associated patTuples and TopTrees will be removed
  --remove_pat=RMPAT    Use this flag to remove the given Pattuple (associated
                        toptrees WILL be removed). As an argument it takes DBS
                        dataset name
  --remove_toptree=RMTOPTREE
                        Use this flag to remove the given TopTree (the source
                        PAT dataset will NOT be removed). As an argument it
                        takes the DataBase ID of the TopTree
  --remove_pat_only=RMPATONLY
                        Use this flag to remove the given Pattuple (associated
                        toptrees will NOT be removed). As an argument it takes
                        DBS dataset name
  --clean-up            This flag makes the script cross-reference all folders
                        on PNFS with the TopDB database. Unmatched files will
                        be removed from PNFS
  --assume-yes          Answer yes to all questions
    • Examples:
cd $HOME/AutoMaticTopTreeProducer/tools/

python removetool.py --remove_pat=25032011_150909
-> remove this pat-tuple and ALL linked toptrees

python removetool.py --remove_toptree=385
-> remove a single toptree

python removetool.py --remove_dataset="/LM3_SUSY_sftsht_7TeV-pythia6/Fall10-START38_V12-v1/AODSIM" => this is in general used for deprecating production cycles
-> remove all pat-tuples that are linked to this dataset
  -> remove all toprees that are linked to each pat-tuple
-> remove the dataset entry

python removetool.py --clean-up
-> checks all the folders on PNFS and cross-checks them with the TopDB. If the TopDB does not contain this toptree, it is removed if it's older than 10 days. (So we do not interfere with ongoing productions which are not yet entered in the DB)
    • Imagine you would like to remove all pattuples/toptrees from TopDB and from /pnfs connected to a certain production cycle, this is the way to proceed:
cd $HOME/AutoMaticTopTreeProducer/tools/
nano list_patfiles_from_cycle_to_remove.sh
-> edit the list of production cycles you would like to remove

./list_patfiles_from_cycle_to_remove.sh > to_rm_all
-> create the list of pattuples and toptrees to remove

./removeall.sh 
-> remove the pattuples and toptrees listed in to_rm_all, confirmation will be asked for each pattuple+toptree removal

./removeall.sh --assume-yes 
-> remove the pattuples and toptrees listed in to_rm_all without interference (dangerous!)


Update/add the JSON for data samples

    • IMPORTANT: please attach the proper JSON to all data toptrees (after the production of the toptree). This is used for lumi calculation and skimming
    • Use the following tool: $HOME/AutoMaticTopTreeProducer/tools/updateJSONFile.py
    • Syntax: python updateJSONFile.py <Search String> <JSON URL> <PUJSON URL>
    • Example:
ssh dhondt@mtop.iihe.ac.be
cd $HOME/AutoMaticTopTreeProducer/tools/
python updateJSONFile.py 22Jan2013-v1 https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Reprocessing/Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/PileUp/pileup_JSON_DCSONLY_190389-208686_All_2012_pixelcorr.txt
    • The example command will search the database for samples matching "22Jan2013-v1" and interactively prompt the user if the JSON for this sample should be updated. The PUJSON argument, needed to calculate the PU histograms for PU reweighting in an analysis, is optional.

How to update the LumiCalc recipe used for TopDB

ssh dhondt@mtop.iihe.ac.be

cd /var/www/TopDB/app/scripts

    • Check the CMSSW version there is it the one you need? No? then continue */
    • Scram the release that you need according to the wiki e.g.:
scramv1 p CMSSW CMSSW_3_8_0
cd CMSSW_3_8_0/src
cvs co -r V02-01-03 RecoLuminosity/LumiDB
cd RecoLuminosity/LumiDB
scramv1 b
cmsenv
    • Then go back to /var/www/TopDB/app/scripts and point the CMSSW symlink to it (please do not remove the actual "old" cmssw_x_y_z):
cd /var/www/TopDB/app/scripts

rm CMSSW

ln -s CMSSW_3_8_0 CMSSW
    • Give it the proper permissions so the webserver can use it

chown dhondt:apache CMSSW_3_8_0 -Rv

chmod 775 CMSSW_3_8_0

    • Go test it in the TopDB
    • If all goes well you should see it start running by doing
ps U apache

Maintenance of the TopSkimmer interface in TopDB

Maintaining the skimmer interface on mTop

The TopDB-TopSkimmer interface scripts are located in $HOME/TopSkim on dhondt@mtop.

    • Make sure that SkimProduction_threaded.py is running continuously on this account.
    • If it is not running, start it from a CLEAN shell (important)
exit (if previously logged in)

ssh dhondt@mtop.iihe.ac.be

cd $HOME/AutoMaticTopTreeProducer

python SkimProduction_threaded.py >> logs/SKIMProductionWorkflow-ERROR.txt 2>&1 &

exit (to make sure it keeps running in the background)

    • Production is started once a sample is added to the skimrequests section of TopDB.
    • The script will run inside N threads LOCAL and M threads PBS at this moment. Can be trivially extended/reduced by setting the "nSkimWorkersLocal and nSkimWorkersPBS" variable in $HOME/AutoMaticTopTreeProducer/.config . Don't touch this if you don't know what you are doing.
    • Every 10 minutes, the script will reload this config.
    • What can you find there:
      • logs directory: containing the logs from the skimmer interface
      • Skimmed: this folder contains the skimmed files which are downloaded trough TopDB.
      • Directories named after the TopTreeProducer tag e.g. CMSSW_41X_v1
    • Add a new TopTreeProducer version to the interface: e.g. CMSSW_41X_v1 (for experienced shifters)
cd $HOME/TopSkim

/* setup your cvs */

export CVS_RSH=/usr/bin/ssh
export CVSROOT=:ext:<CERNUSERNAME>@cmscvs.cern.ch:/cvs_server/repositories/CMSSW

cvs co -r <TOPTREEPRODUCER_VERSION> -d <TOPTREEPRODUCER_VERSION>/TopBrussels/TopTreeProducer UserCode/TopBrussels/TopTreeProducer

for example: <TOPTREEPRODUCER_VERSION> == CMSSW_42X_v3

Central Skims

    • The idea is to produce useful skims centrally for the entire group (the proposal is to have an electron and a muon skim, since these are the main channels used in the group). Central skims should be produced for an entire production round with the same settings. The name for publishing the central skim should be consistent during one production round!
    • Electron skim has a name "ElectronSkim" and contains events with:
      • FOR SPRING 11: At least 1 electron with pt > 20 GeV/c, fabs(eta) < 2.5 and relIso < 0.1
    • Muon skim has a name "MuonSkim" and contains events with:
      • FOR SPRING 11: At least 1 muon with pt > 20 GeV/c, fabs(eta) < 2.1 and relIso < 0.1
    • Remarks:
    • Skims can be produced in the following way:
      • Just do as you would normaly do for a skim via the TopDB.
      • Difference 1: choose username "dhondt" instead of your own when finishing the skim.
      • Difference 2: instead of providing your own grid-proxy, provide the one of "dhondt". So follow the same instruction but do it from the dhondt-account and not your own.
      • Difference 3: instead of providing you own email address, provide the one of the TopTree Announcement for the Brussels TOP group: top-brussels-datasets@cern.ch
    • Warning: Whenever you use the same xml file, keep in mind that one needs a different xml for data, TTbar (containing GenEvent) and simulated samples different from TTbar.

How to clean up skims (for experienced shifter)

    • In case you want to kill a skim that is running, use
 
ps -x // this shows all processes currently running
// find the relevant process ID and related proxy and kill both using
kill -9 process_ID_of_exporting_user_proxy
kill -9 process_ID_of_python_skim_script_aka_SkimTopTree.py
    • Cleaning STEP1:
    • For processes that were killed, one needs to check if there is still something on the /localgrid (only relevant if the skim was running on PBS)
ls /localgrid/dhondt
    • All directorynames and filenames containing *thread* should be removed. To find the things to be removed you can also use
ls /localgrid/dhondt/*thread*
    • Cleaning STEP2:
    • Cleaning in the skimmer on /home/dhondt
ls /home/dhondt/TopSkim/CMSSW_41X_v1_patch2/TopBrussels/TopTreeProducer/*skimmer*
    • Warning, read the following sentence completely before you proceed: all the directories containing *skimmer* should be removed EXCEPT THE SKIMMER ITSELF (that is located in the directory "skimmer").
    • Cleaning STEP3:
    • Removal of the skim on /pnfs:
rmsrmdir.sh /pnfs/iihe/cms/store/user/dhondt/Skimmed-TopTrees/<the_path_to_the_skim_to_be_removed>
    • Cleaning STEP4:
    • Remove the relevant entry from the database. IF YOU NEVER DID THIS BEFORE, ASK SOMEONE TO ASSIST YOU BEFORE YOU PROCEED!
    • Script to remove TopTrees produced on mTop:
    • The idea is to remove the skimmed toptree files once their age exceeds 20 hours. This is done in a cronjob. To clean up the Skimmed dir (when the cronjob should fail), removing all files older than 20h, do the following:
 
ssh dhondt@mtop.iihe.ac.be
cd $HOME/TopSkim
python SkimRemover.py

The backup system!!!

As a "shifter", please also check that the backup system still works.

    • The backup system is just a small script $HOME/backup.sh, which
      • Creates a dump of the SQL database
      • Creates a backup tarball with the TopDB website and SQL dump
      • Creates a backup tarball of the TopSim interface in $HOME/TopSkim
    • There are two tarballs placed in $HOME/TopDB-Backups every day just.
    • The script runs in cron just after midnight.
    • Check that the script still works by checking the timestamp of the last two files
[dhondt@mtop ~]$ ls -ltr $HOME/TopDB-Backups | tail -2
-rw-r--r-- 1 dhondt users  2150164 Mar 26 08:36 topdb-backup-26032011.tar.gz
-rw-r--r-- 1 dhondt users 25831146 Mar 26 08:36 topdb-TopSkimInterface-backup-26032011.tar.gz
[dhondt@mtop ~]$
    • This script also rsyncs the AutoMaticTopTreeProducer and TopDB-Backups dirs to the NFS /user area, i.e./user/dhondt/
    • The script is executed by crontab (e.g. do crontab -l as dhondt and you will see)

How to administer mtop

      • To give a user access to the dhondt account on mtop soneone that has access to that account should:
        • log in as dhondt on mtop
        • add the ssh -rsa key of the user (ask the user to provide you the content of id_rsa.pub) in /home/dhondt/.ssh/authorized_keys
      • All the following commands require root privileges, you can make yourself root (if you are allowed to) by doing:
 
ssh <user>@mtop.iihe.ac.be 
sudo su -
      • Reboot mtop
reboot
      • Shut down mtop
shutdown -h now
      • Add normal user
          • First connect to one of the m-machines and do the following
[xx@m7 ~]$ cat /etc/passwd | grep <username_to_add_to_mtop>
<username_to_add_to_mtop>:x:20523:20500:local user:/user/<username_to_add_to_mtop>:/bin/bash
[xx@m7 ~]$ 
          • Then go to mtop and add the following line to /etc/passwd. After adding the line, the person should have immediate access to mtop!
-- make yourself root--

nano /etc/passwd

add the following line: (hence, copy the line you got on the m-machines but change the group_id from 20500 to 501)
<username_to_add_to_mtop>:x:20523:501:local user:/user/<username_to_add_to_mtop>:/bin/bash

control + x (save)

      • Add user with root access
          • this is the same procedure as above except you add a different group number (500 instead of 501)
<username_to_add_to_mtop>:x:20523:500:local user:/user/<username_to_add_to_mtop>:/bin/bash
          • instead of
<username_to_add_to_mtop>:x:20523:501:local user:/user/<username_to_add_to_mtop>:/bin/bash
          • to give/remove root-privileges to existing users: just open /etc/passwd, find the line for that user and edit the group_id number
      • To change the memory limits --> /etc/security/limits.conf
          • For a whole group (admins or users)
@admins hard as 6291456 (in bytes)
          • Or for a specific user
pvmulder hard as 6291456
          • Changes are in place from the next login on


How to power cycle mTop (as admin user)

When the server crashes, it can be rebooted remotely. To access the remote access controller on the machine you have to make ssh tunnels (for the last tunnel you need to be root on your machine!)

ssh -f -N -L 5901:192.168.9.81:5901 USER@m0.iihe.ac.be
ssh -f -N -L 5900:192.168.9.81:5900 USER@m0.iihe.ac.be
ssh -f -N -L 443:192.168.9.81:443 USER@m0.iihe.ac.be

For the last tunnel, you need to be root. Make also sure that your ssh keys are in /root/.ssh/ In case needed, you might need to kill all tunnels

killall ssh

and redo the first step. If the tunnels are made, you can access the controller interface at https://localhost:443 (this is the so called DELL Drac gui). The username is your username for topdb database and the main (not your personal) password of the topdb database (at the moment only Gerrit and Petra can do this!). In this interface you can go to "Power" and power cycle the machine. Another way to do this (and to follow what is happening during the reboot):

      • Look at the black box (= screenshot of the terminal) and click the launch button for the virtual console.
      • You will be prompted to download a java file
      • Go to the downloaded file and rename it to viewer.jnlp
      • Execute with
javaws viewer.jnlp
    • Check what is going on on the screen. In the list below the black box there is a list of options. E.g. warm reboot (asking the server nicely to reboot) or power cycle (equivalent of cutting the power and then switching the machine back on).

How to restart TopDB critical services on mTop (as admin user)

    • BE CAREFUL WITH ALL ACTIONS STARTING WITH:
ssh username@mtop.iihe.ac.be
sudo su -
      • When rebooting mtop, once the machine is online again, one should load the IP routes again. If this is not done, the mtop server won't be able to connect to the T2 storage system and the scripts will fail. To do this (only for admin users):
 
ssh user@mtop.iihe.ac.be 
sudo su -
cd /root
source routes 
      • The last command should not give output, it is possible that there are unknown hosts. This is not a real problem, but the corresponding lines could be removed from the routes file. It is also relevant to check from time to time if there are new routes available. To do this, first list the routes on mtop:
route 
      • Then, go to one of the m-machines and do
/sbin/route
    • Compare the output of both commands and add if necessary the routes that are missing.
    • The webserver (apache)
      • Check if it's running
/etc/init.d/httpd status 
      • Restart the service if necessary:
/etc/init.d/httpd restart /* look for possible errors/warnings in the output */
      • To check if it's working, just navigate to the TopDB webpage.
      • LogFiles of the webserver are stored in /var/log/httpd
    • The DataBase server (MySQL)
      • Check if it's running
/etc/init.d/mysqld status 
      • Restart the service if necessary:
/etc/init.d/mysqld restart /* look for possible errors/warnings in the output */
      • To check if it's working:
mysql -u idontexist

This should give the following output (not necesseraly?)

ERROR 1045 (28000): Access denied for user 'idontexist'@'localhost' (using password: NO)
      • LogFile of the MySQL Server is stored in /var/log/mysqld.log
    • The SNMP monitor
service snmpd restart
    • Get the date right:
ntpdate ntp.telenet.be
    • Make sure pnfs directories are visible from mtop:
mount -r -t nfs4 maite.wn.iihe.ac.be:/ /mnt/dCache
    • For the production of toptrees that was already ongoing: put all requests on manual using via the direct access to the database: https://mtop.iihe.ac.be/phpmyadmin/
    • Finally, log out as root, log in as dhondt, and restart each script *separately from a clean shell* in /home/dhondt/AutoMaticTopTreeProducerGit (!), EXCEPT StartProduction_threaded.py, that you have to restart in the cms environment as mentioned above (see 'Operating the production script'):
python SkimProduction_threaded.py >> logs/SKIMProductionWorkflow-ERROR.txt 2>&1 &
python SIMProduction_threaded.py >> logs/SIMProductionWorkflow-ERROR.txt 2>&1 &
python CleaningAgent_threaded.py >> logs/CleaningAgentWorkFlow-ERROR.txt 2>&1 &
python StartProduction_threaded.py >> logs/ProductionWorkflow-ERROR.txt 2>&1 &
    (Starting the scripts as root may result in permission conflicts, as the scripts has 'dhondt' permissions!)

OLD INSTRUCTIONS

(OUTDATED!!!!) How to skim and download the produced TopTrees without using the TopDB.

Because the TopTree production runs on mtop aswell as the TopDB-TopSkimmer, it might be optimal to not use the Skimmer interface for multiple skims at once (>5) and for very large samples (e.g.: large QCD). This is why the TopSkim script was ported to a "standalone" version. Here are some guidelines on how to use it:

      • First connect to the TopDB to retrieve the PNFS path for the TopTree of your interest.
      • Connect to an SLC5 m* machine (m6,7,8,9)
      • Use the following recipe:
/* Do NOT cmsenv in ANY CMSSW release */
/* GET THE SCRIPT */
cvs co -r stable_v2_p2 UserCode/mmaes/StandAloneTopSkim
cd UserCode/mmaes/StandAloneTopSkim
mkdir logs
/* INSERT THE TOPTREEPRODUCER IN THE SCRIPT DIR */
cvs co -r <YOUR_TOPTREEPRODUCER_VERSION> UserCode/TopBrussels/TopTreeProducer
mv UserCode/TopBrussels/TopTreeProducer . -vf
rm -rfv UserCode
Do not try to compile anything (the script will do that properly for a special standalone root version)
/* FOR TopTreeProducer CMSSW_38X_v1 do the following before running the skimmer! (Fix for 36X to come)*/
cd TopTreeProducer/skimmer
rm TopSkim.C
cvs update -r 1.1.2.7.2.1.2.3.2.4 TopSkim.C
cd ../../
This will remove a memory leak.
/* Create a valid grid proxy (if not created already) */
voms-proxy-init --voms cms:/cms/becms --valid 190:00
/* Run the Skimmer script */
python SkimTopTree.py -s XML/template.xml -l /pnfs/iihe/cms/store/user/dhondt/SOME/SAMPLE/SOMEDATE/TOPTREE/
/* Some new options in the Script*/
-t <string>: location of the toptreeproducer directory. By default "TopTreeProducer". Allows you to maintain different versions in one setup.
-n <number>: Instead of running TopSkim per 1 file, do it per <number> of files (e.g.: instant merging). -n -1 will give 1 file as output (beware of TTree size limit)
-p <string>: Instead of copying to a local directory, it will copy the output (+skim.xml) to your PNFS dir (you need to have a BEGrid Cert).
-a : When using -p, you can use -a to automatically announce the Skim production to the datasets mailinglist. 
-j <number>: Run the script in multiple threads. This is only useful when not using -n -1 (then there is only 1 TopSkim job that will run). E.g.: when you want to skim a sample of 100 files and you choose -n 50 and -j 2, the script will directly start 2 TopSkim instances each running 50 files. Without -j 2, it would just finish the first 50 and then do the next 50.
--use-pbs: rather than letting the threads run the TopSkim localy, the job is submitted on localGrid trough PBS.
--log-stdout: Rather than writing to a logfile, the main output is written to the terminal.
    • The package contains a directory XML which contains a template skim XML. CAUTION: Inside the skim XML do NOT! change the inputdatasets as the script will automatically change this parameter. The outputfilename can be changed to anything.
    • The output of the skimmer will be put inside Skimmed/timestamp/ (when not using -p)


Script Operation instructions (UNSUPPORTED)

Install the package

    • check out current cvs tag: (HEAD)
cvs co UserCode/mmaes/AutoMaticTopTreeProducer
  • Now setup your grid-certificate to work with the producer.
cd UserCode/mmaes/AutoMaticTopTreeProducer
echo "GridPass fillInYourPass" >> .config
chmod -v 600 .config <- IMPORTANT!

Add a CMSSW version to the package

    • Add a CMSSW version to your setup and add PatAlgos and TopTreeProducer package. (CMSSW versions below 3_3_X are not supported)
#get CMSSW

cd UserCode/mmaes/AutoMaticTopTreeProducer
scram p CMSSW CMSSW_3_5_6_patch1

# get the packages

cd CMSSW_3_5_6_patch1/src; cmsenv
addpkg PhysicsTools/PatAlgos
cvs co -r CMSSW_35X_v10 UserCode/TopBrussels/TopTreeProducer; mv -vf UserCode/TopBrussels .; rm -rfv UserCode

# compile

scramv1 b -r -j 8

Run the script as outside of central production

NOTE: this is NOT recommended!

    • Before starting, make sure that you didn't do cmsenv or source any crab version. Otherwise reconnect and make sure that it's not done from your bash_login.
    • Before submitting any task, make sure that you change the email lists in MailHandler.py to YOUR own email address!
    • The standard python2.3 is not recent enough to run the script. Normally cmsenv would provide us python2.4 but we have to fix this by running:
source /swmgrs/cmss/slc4_ia32_gcc345/external/python/2.4.2-cms6/etc/profile.d/init.sh 
    • The main script is called AutoMaticTopTreeProducer.py:
*python AutoMaticTopTreeProducer.py -h (this output is produced using the production_v6 tag)

usage: AutoMaticTopTreeProducer.py [options]

options:
  -h, --help            show this help message and exit
  -c CMSSW_VER, --cmssw=CMSSW_VER
                        Specify which CMSSW you want to use (format
                        CMSSW_X_Y_Z)
  -d DATASET, --dataset=DATASET
                        Specify which dataset you want to process (format
                        /X/YZ)
  -g GLOBALTAG, --globaltag=GLOBALTAG
                        Specify the GlobalTag for PAT production
  -r RUNSELECTION, --runselection=RUNSELECTION
                        Specify the GlobalTag for PAT production
  --dbsInst=DBSINST     Specify the DBSInstance to perform DBS queries
  --ce-blacklist=CEBLACKLIST
                        Specify a comma-separated list of CE's to blacklist IN
                        CRAB
  --start-from-pat      Use this flag to start from a previously published PAT
                        sample, provide the sample na me trough -d
  --start-from-pat-mc   Use this flag to start from a previously published PAT
                        MC sample, provide the sample na me trough -d
  --skip-pat            Use this flag to go directly from RECO to TopTree
                        (Warning: no JEC/cleaning will be applied)
  --pbs-submit          Submit CRAB jobs to localGrid (Your dataset should be
                        at T2_BE_IIHE)
  --dry-run             Perform a Dry Run (e.g.: no real submission)
  --log-stdout          Write output to stdout and not to logs/log-*.txt
  --run31Xcompat        Run PAT in CMSSW_31X mode (will become obsolete)
  --addGenEvent         Use this flag to add GenEvent into PAT and TopTree
                        (WARNING: for ttbar MC only)
  --hltMenu=HLTNAME     Use this option to change the HLT menu name e.g.
                        HLT8E29

    • An example for running this script:
python AutoMaticTopTreeProducer.py -c CMSSW_3_5_6_patch1 -d /TTbar/Summer09-MC_31X_V9_Feb15-v1/GEN-SIM-RECO -g MC_3XY_V26::All --dry-run
    • NOTE: Allways do --dry-run before actual submitting jobs to check the configuration files.
    • NOTE: You can change the total number of events in CrabHandler.py to submit only a small batch of testjobs.
    • NOTE: The script will not be able to insert your dataset in TopDB if you run this in your personal account. If your dataset is interesting for other users, add it manually.


Useful links


Template:TracNotice