GetLostFiles
Retrieve lost files from datasets
Introduction
Some files in datasets are not copied correctly to the T2's. It is at most a few files per dataset. A few scripts are in place to easily put them where they belong.
Identify
First, Go to the corrupted files page and see if there are any files from our site. If so, click on the "Get source JSON" tab
create the necessary files
Put the download file in /user/odevroed/Get_Lost_Files
Edit the file and remove all the html tags. This leaves only the json part. For further reference in this document, I called it cms-popularity.cern.ch.json
Run the script a first time (with option 1 :) ):
python /user/odevroed/bin/download_missing_dataset_files.py 1 ./cms-popularity.cern.ch.json
This will generate the file "files_to_retrieve" which contains only the files from our site that are not present (In case it was just a coincidence).
Go to this file and via DAS, find out at which site they reside. Usually you can aggregate many files from the list, present at one single site.
Put this list together in a new file, let's say "files_per_site" and give this as an argument for the second pass of the script (replace by the appropriate site name):
python /user/odevroed/bin/download_missing_dataset_files.py 2 T2_CH_CSCS ./files_per_site
The result of the script is the output:
fetching from site: srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat The files are ready to be transferred First, copy the files to us: srmcp -debug -streams_num=1 -2 -copyjobfile=transfer_to_us Then, put them on storage: srmcp -debug -streams_num=1 -2 -copyjobfile=put_on_storage
then do what the script told you to do :)