GridStorageAccess: Difference between revisions

From T2B Wiki
Jump to navigation Jump to search
m (Created page with " This page describes how to handle data stored on grid storage. === Before starting === Image(UsefulImages:Exclamation_mark.png,height=50)Before being able to run these ...")
 
 
(31 intermediate revisions by 5 users not shown)
Line 1: Line 1:


This page describes how to handle data stored on grid storage.
 
=== Before starting ===
This page describes how to handle data stored on our mass storage system.
[[Image(UsefulImages:Exclamation_mark.png,height=50)]]Before being able to run these commands you first need to make a valid proxy <br>
== Introduction ==
*'''voms-proxy-init --voms cms:/cms/becms'''
T2B has an ever increasing amount of mass storage available. This system hosts both the centrally produced datasets as well as the user produced data. <br>
=== SRM ===
The mass storage is managed by a software called [https://www.dcache.org/ dCache]. As this software is in full development, new features are added continuously. This page contains an overview of the most used features of the software. <br>
The SRMv2 interface to most grid storage completely replaces most of the EDG commands.
 
==== srm-commands ====
== General info ==
*''srmls'': get information on a file
As dCache was designed for precious data, the files are immutable. This means that once they are written, they cannot be changed any more. So, if you want to make changes to a file on dCache, you need to first erase it and then write it anew.
*''srmmv'': rename a file
Our dCache instance is mounted on all the M machines and can be browsed via the /pnfs directory. If you want to find your personal directory, the structure is the following:
*''srmrmdir'': remove a directory
<pre>/pnfs/iihe/<Experiment>/store/user/<Username>    <-- Replace <Username> and <Experiment> accordingly.</pre>
*''srmmkdir'': create a directory
On the M machines as well as the whole cluster, /pnfs is mounted read-write via a protocol called 'nfs'. Please be aware that you can now inadvertently remove a large portion of your files. As it is a mass storage system, you can easily delete several TBs of data.
*''srmrm'': remove a file
 
*''srmcp'': copy files (still uses protocol v1!!)
Writing and deleting files can still be done via via grid enable commands (see next section). These should still provide the best read/write speeds, compared to nfs access. These commands are mostly done via scripts, so the probability of an error is lessened. <br>
==== usage ====
 
*newer srm clients support transparent usage of the srm version
== Before starting ==
**in case this doesn't work, please use the equivalent command as listed in the OLD usage section below
In what follows, most of the commands will require some type of authentication to access /pnfs. This is because these commands can be executed over WAN and your location is irrelevant. <br>
*<tt>srm<command> --help</tt> will print out very detailed info on how to use the commands
The way authentication is done on our mass storage instance is via an x509 proxy. This proxy is made through your grid certificate. If you do not have a grid certificate, see [https://t2bwiki.iihe.ac.be/Getting_a_certificate_for_the_T2 this page] on how to get one.<br>
**example usage can be found at the end of the help output
The command to make a grid proxy is:
*an srm-url has following possible forms
<pre>voms-proxy-init --voms <MYEXPERIMENT></pre>
**<tt>srm://<name_of_server>:<port>/some/path</tt>
 
**remote access to file/directory
Where ''<MYEXPERIMENT>'' is one of 'cms, icecube, beapps'
**this is what you will be using
 
**<tt>file:////some/path</tt>
== Browser access ==
**local access to file/directory 
dCache now exposes files using the WebDav protocol. This means that the files are accessible to browse over https.<br>
**the 4 ''/'' are really needed
For this, you need to have your certificate in your browser (to import your .p12 certificate, google is your friend).<br>
==== examples ====  
Then just point your browser to:
*list contents of a directory ''/pnfs/iihe/cms'' on machine ''maite.iihe.ac.be''
https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/
<pre>
 
  srmls srm://maite.iihe.ac.be:8443/pnfs/iihe/cms
To be able to see and download your files.
</pre>  
 
*create a directory  
 
<pre>
dCache has an even more powerful web interface. It is called dCache View and can be accessed via:
  srmmkdir srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/stdweird/kkllddsrm
https://maite.iihe.ac.be:3880/
</pre>  
Work is still in progress to make all actions work (12/2021).
 
 
== Access via GFAL ==
GFAL is a wrapper around the latest grid commands. Learning to use it means that whatever middleware requires to be used in the future, you don't need to learn new commands (like srm, lcg, etc)
<br>
=== gfal-commands ===
If you want more information on the options that can be used, please use the man pages of gfal !
 
Here are all the commands that can be used:
*''gfal-ls'': get information on a file
*''gfal-mkdir'': remove a directory
*''gfal-rm'': removes a file. To remove an entire directory, use -r
*''gfal-copy'': copy files.
 
<br><br>
 
=== Usage ===
There are 2 types of file url:
* '''Distant files''': their url is of the type <protocol>://<name_of_server>:<port>/some/path, eg for IIHE:
:<pre>davs://maite.iihe.ac.be:2880/pnfs/iihe/</pre>
* '''Local files''': their url is of the type file://path_of_the_file, eg for IIHE:
:<pre>file:///user/$USER/MyFile.root</pre>
 
[[File:Exclamation-mark.jpg|left|40x30px|line=1|]] Be careful, the number of '''/''' is very -very- important [[File:Exclamation-mark.jpg|40x30px|line=1|]]
 
*To get a list of all distant urls for all the Storage Elements, one can do:
:<pre>lcg-infosites --is grid-bdii.desy.de --vo cms se </pre>
:or read more in the [[CERNGridUrls | CERN EOS urls page]].
 
=== Protocols ===
 
* '''https/WebDavs'''  [preferred] [try this one first]
::<pre>gfal-ls davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/</pre>
 
* '''xrootd'''  [preferred]
::<pre>gfal-ls root://maite.iihe.ac.be:1094/pnfs/iihe/cms/store/user/</pre>
 
* '''srm'''  [deprecated]
::<pre>gfal-ls srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/store/user/</pre>
 
* '''nfs'''  ''{local-only  |  no-cert}''
::<pre>gfal-ls /pnfs/iihe/cms/store/user/</pre>
 
* '''dcap'''  ''{local-only  |  no-cert}''
::<pre>gfal-ls dcap://maite.iihe.ac.be/pnfs/iihe/cms/store/user</pre>
 
=== Examples ===
*To list the contents of a directory ''/pnfs/iihe/cms'' :
::<pre> gfal-ls davs://maite.iihe.ac.be:2880/pnfs/iihe/cms </pre>
 
* To create a directory:
::<pre> gfal-mkdir davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/$USER/NewDir </pre>
 
*copy file from local disk to remote server  
*copy file from local disk to remote server  
<pre>
::<pre> gfal-copy file:///user/$USER/MyFile.root davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/$USER/ </pre>
  srmcp file:////bin/bash srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/stdweird/kkllddsrm/file
 
</pre>  
* To copy a file from remote server to our Storage Element:
*delete a file on remote server
::<pre> gfal-copy gsiftp://eosuserftp.cern.ch/eos/user/r/rougny/Documents/desktop.ini davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/rougny/ </pre>
<pre>
  srmrm srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/stdweird/kkllddsrm/file
</pre>
*remove a directory on remote server   
<pre>
  srmrmdir srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/stdweird/kkllddsrm
</pre>   
  The directory must be empty. There is a "recursive" option, but this only applies if the directory to delete only contains empty sub-directories :
<pre>
  srmrmdir -recursive=true srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/stdweird/kkllddsrm 
</pre>


==== Bulk file transfers ====
* To delete a file on remote server
::<pre> gfal-rm srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/store/user/$USER/MyFile.root </pre>


There is an elegant way to run srmcp through several files. This is done using the copyjobfile option within a srmcp command.
* To remove a directory and its entire content on remote server ?!? not working for now ?): 
::<pre> gfal-rm -r srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/store/user/$USER/NewDir </pre>


Here are the details on how to use multiple srm instance in one command.
<br>
Notice that the order it will process files is not necessarily the one listed in your input file.


Syntax:
== Copy more than 1 file ==
<pre>
srmcp -copyjobfile=datafile
</pre>
where datafile is a file where every line is a source + destination in srm url syntax as followed (in other words source and destination '''IN ONE LINE'''):


<pre>
==== Copy Directories ====
      source                                    destination
</pre>


<pre>
You can easily copy whole directories to/from our site using gfal commands. <br>
srm://maite.iihe.ac.be:8443/pnfs/iihe/blabla file:///$PWD/blalbla
It is usually must faster than using scp or rsync commands.
</pre>
gfal-copy -r [--dry-run] gsiftp://eosuserftp.cern.ch/eos/user/r/rougny/Documents/ davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/rougny/
or, directly from a remote srm storage to our dCache:


<pre>
The magic option is '''-r''' for recursive copying.<br>
srm://srm-eoscms.cern.ch:8443/srm/v2/server?SFN=/eos/cms/store/group/comm_trigger/L1TrackTrigger/BE5D_620_SLHC6/singleMu/NoPU/reDIGI_SLHC6-TrackTrigger_muon_pgun-0499.root srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/store/user/odevroed/eosTransfer.root
When you are sure you get what you want, remove the --dry-run option.<br>
</pre>


Note that by default, gfal-copy will not overwrite files already present at the destination. This means it is usually safe to run the command several times.<br>
If you want to force the copy over files already there, add the '''-f''' option to your gfal-copy command.


Make some tests with one line in datafile and make sure the srm url is OK for both source and destination before running over several files.
<br>
==== Bulk copy from a list of files ====


If you have any issue try with <tt>-debug</tt> and try to force <tt>-streams_num=1</tt> and put this options as well <tt>-srm_protocol_version=2 -globus_tcp_port_range=21000,25000</tt>
There is an elegant way to run gfal-copy through several files. This is done using the '''--from-file''' option.
<pre> gfal-copy -f [--dry-run] --from-file files.txt file://location/to/store/ </pre>
where files.txt is a file where every line is a source like:
<pre>davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/eosTransfer-1.root
davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/eosTransfer-2.root
... </pre>


Make some tests with one line in datafile and make sure the url is OK for both source and destination before running over several files.<br>
When you are sure you get what you want, remove the --dry-run option.


==== copy directories from and pnfs within the IIHE ====
<br>
A script to copy full directories to and from pnfs exists on the slc6 UI's:
==== WebFTS web interface ====
If you prefer using a web interface to copy files to/from our /pnfs, you can use CERN's [https://webfts.cern.ch/ WebFTS feature].<br>
Not all experiments are allowed to use it, but you can always make a request to have it included.


<pre>
== Other ways to access the mass storage system ==
copyDirectoryPnfs.py
=== Read and copy access ===
Move all files in a directory to or from pnfs
This script assumes that you copy within the IIHE
The script does not do recursive copying
Make sure you have a valid proxy, made with voms-proxy-init --voms cms:/cms/becms


Mandatory options:
As stated in the introduction, dCache is an immutable file system, therefore files cannot be changed once they are written.
--in=                : directory to copy from
Files can be accessed from pnfs in several ways without the requirement of a grid certificate and grid tools.
--out=                : directory to copy to
Both directories need to be complete (i.e. including the /pnfs or /user part


example:
* Via the regular 'cp' command (prefer the rsync command below):
copyDirectoryPnfs.py --out=/user/odevroed/newfile --in=/pnfs/iihe/cms/store/user/odevroed/newdir
::<pre>cp /pnfs/iihe/cms/store/user/odevroed/DQMfile_83_1_hF2.root /user/odevroed </pre>


Optional:
* Via the regular 'rsync' command:
-h, --help            : print this help message
::<pre>rsync -aP /pnfs/iihe/cms/store/user/odevroed/*.root /user/odevroed/</pre>
</pre>


* Via the dcache copy command (dccp):
::<pre>dccp dcap://maite.iihe.ac.be/pnfs/iihe/cms/store/user/odevroed/DQMfile_83_1_hF2.root ./ </pre>


=== dCache ===
* To open files directly using root, use eg
Direct dcache access to files is only possible if the software supports it.
::<pre>root dcap://maite.iihe.ac.be/pnfs/iihe/some/file.root </pre>
PNFS (the directory structure seen under /pnfs/) is NOT a real filesystem, so normal system commands will mmostly not work.
*Commands that work
**ls
*replacemnt command:
**dccp (to copy files from /pnfs to local disk or reverse)
*ROOT has dcache support
**to open files in dcache using root, use eg
<pre>
  root dcap://maite.iihe.ac.be/pnfs/iihe/some/file.root
</pre>


When reading out the rootfiles is rather slow or it doesn't work at all and nothing is wrong with the root file (e.g. in an interactive analysis on beo or msa) you can increase your dCache readahead buffer. Don't make the buffer larger than 50MB! To enlarge the buffer set this in you environment for csh:
::When reading out the root files, if is rather slow or it doesn't work at all, and nothing is wrong with the root file (e.g. in an interactive analysis mX machines) you can increase your dCache readahead buffer. Don't make the buffer larger than 50MB!
<pre>
::To enlarge the buffer set this in you environment:<br>
setenv DCACHE_RAHEAD 1
::'''For csh'''
setenv DCACHE_RA_BUFFER 50000000
:::<pre>setenv DCACHE_RAHEAD 1</pre>
</pre>
:::<pre>setenv DCACHE_RA_BUFFER 50000000</pre>
For bash:
<pre>
export DCACHE_RAHEAD=true
export DCACHE_RA_BUFFER=50000000
</pre>
See the dChache [http://www.dcache.org/ fanpage] for further reading.


=== EDG ===
::'''For bash'''
Older edg-gridftp-* commands.
:::<pre>export DCACHE_RAHEAD=true</pre>
*uses <tt>gsiftp://<server>/path/to/file</tt> protocol for remote files
:::<pre>export DCACHE_RA_BUFFER=50000000</pre>


* Via the 'curl' command over https
:'''Copy from /pnfs:'''
::<pre>curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath $X509_CERT_DIR  -O https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/testing_transfer</pre>


:'''Copy to /pnfs:'''
::<pre>curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath $X509_CERT_DIR  -T testing_transfer  https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/testing_transfer_2</pre>


{{TracNotice|{{PAGENAME}}}}
::This is equivalent to issuing the gfal-cp command via the https protocol:
::<pre>gfal-copy testing_transfer https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/testing_transfer2</pre>

Latest revision as of 08:38, 24 November 2022


This page describes how to handle data stored on our mass storage system.

Introduction

T2B has an ever increasing amount of mass storage available. This system hosts both the centrally produced datasets as well as the user produced data.
The mass storage is managed by a software called dCache. As this software is in full development, new features are added continuously. This page contains an overview of the most used features of the software.

General info

As dCache was designed for precious data, the files are immutable. This means that once they are written, they cannot be changed any more. So, if you want to make changes to a file on dCache, you need to first erase it and then write it anew. Our dCache instance is mounted on all the M machines and can be browsed via the /pnfs directory. If you want to find your personal directory, the structure is the following:

/pnfs/iihe/<Experiment>/store/user/<Username>     <-- Replace <Username> and <Experiment> accordingly.

On the M machines as well as the whole cluster, /pnfs is mounted read-write via a protocol called 'nfs'. Please be aware that you can now inadvertently remove a large portion of your files. As it is a mass storage system, you can easily delete several TBs of data.

Writing and deleting files can still be done via via grid enable commands (see next section). These should still provide the best read/write speeds, compared to nfs access. These commands are mostly done via scripts, so the probability of an error is lessened.

Before starting

In what follows, most of the commands will require some type of authentication to access /pnfs. This is because these commands can be executed over WAN and your location is irrelevant.
The way authentication is done on our mass storage instance is via an x509 proxy. This proxy is made through your grid certificate. If you do not have a grid certificate, see this page on how to get one.
The command to make a grid proxy is:

voms-proxy-init --voms <MYEXPERIMENT>

Where <MYEXPERIMENT> is one of 'cms, icecube, beapps'

Browser access

dCache now exposes files using the WebDav protocol. This means that the files are accessible to browse over https.
For this, you need to have your certificate in your browser (to import your .p12 certificate, google is your friend).
Then just point your browser to:

https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/

To be able to see and download your files.


dCache has an even more powerful web interface. It is called dCache View and can be accessed via:

https://maite.iihe.ac.be:3880/

Work is still in progress to make all actions work (12/2021).


Access via GFAL

GFAL is a wrapper around the latest grid commands. Learning to use it means that whatever middleware requires to be used in the future, you don't need to learn new commands (like srm, lcg, etc)

gfal-commands

If you want more information on the options that can be used, please use the man pages of gfal !

Here are all the commands that can be used:

  • gfal-ls: get information on a file
  • gfal-mkdir: remove a directory
  • gfal-rm: removes a file. To remove an entire directory, use -r
  • gfal-copy: copy files.



Usage

There are 2 types of file url:

  • Distant files: their url is of the type <protocol>://<name_of_server>:<port>/some/path, eg for IIHE:
davs://maite.iihe.ac.be:2880/pnfs/iihe/
  • Local files: their url is of the type file://path_of_the_file, eg for IIHE:
file:///user/$USER/MyFile.root

Be careful, the number of / is very -very- important

  • To get a list of all distant urls for all the Storage Elements, one can do:
lcg-infosites --is grid-bdii.desy.de --vo cms se 
or read more in the CERN EOS urls page.

Protocols

  • https/WebDavs [preferred] [try this one first]
gfal-ls davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/
  • xrootd [preferred]
gfal-ls root://maite.iihe.ac.be:1094/pnfs/iihe/cms/store/user/
  • srm [deprecated]
gfal-ls srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/store/user/
  • nfs {local-only | no-cert}
gfal-ls /pnfs/iihe/cms/store/user/
  • dcap {local-only | no-cert}
gfal-ls dcap://maite.iihe.ac.be/pnfs/iihe/cms/store/user

Examples

  • To list the contents of a directory /pnfs/iihe/cms :
 gfal-ls davs://maite.iihe.ac.be:2880/pnfs/iihe/cms 
  • To create a directory:
 gfal-mkdir davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/$USER/NewDir 
  • copy file from local disk to remote server
 gfal-copy file:///user/$USER/MyFile.root davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/$USER/ 
  • To copy a file from remote server to our Storage Element:
 gfal-copy gsiftp://eosuserftp.cern.ch/eos/user/r/rougny/Documents/desktop.ini davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/rougny/ 
  • To delete a file on remote server
 gfal-rm srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/store/user/$USER/MyFile.root 
  • To remove a directory and its entire content on remote server ?!? not working for now ?):
 gfal-rm -r srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/store/user/$USER/NewDir 


Copy more than 1 file

Copy Directories

You can easily copy whole directories to/from our site using gfal commands.
It is usually must faster than using scp or rsync commands.

gfal-copy -r [--dry-run] gsiftp://eosuserftp.cern.ch/eos/user/r/rougny/Documents/ davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/rougny/

The magic option is -r for recursive copying.
When you are sure you get what you want, remove the --dry-run option.

Note that by default, gfal-copy will not overwrite files already present at the destination. This means it is usually safe to run the command several times.
If you want to force the copy over files already there, add the -f option to your gfal-copy command.


Bulk copy from a list of files

There is an elegant way to run gfal-copy through several files. This is done using the --from-file option.

 gfal-copy -f [--dry-run] --from-file files.txt file://location/to/store/ 

where files.txt is a file where every line is a source like:

davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/eosTransfer-1.root
davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/eosTransfer-2.root
... 

Make some tests with one line in datafile and make sure the url is OK for both source and destination before running over several files.
When you are sure you get what you want, remove the --dry-run option.


WebFTS web interface

If you prefer using a web interface to copy files to/from our /pnfs, you can use CERN's WebFTS feature.
Not all experiments are allowed to use it, but you can always make a request to have it included.

Other ways to access the mass storage system

Read and copy access

As stated in the introduction, dCache is an immutable file system, therefore files cannot be changed once they are written. Files can be accessed from pnfs in several ways without the requirement of a grid certificate and grid tools.

  • Via the regular 'cp' command (prefer the rsync command below):
cp /pnfs/iihe/cms/store/user/odevroed/DQMfile_83_1_hF2.root /user/odevroed 
  • Via the regular 'rsync' command:
rsync -aP /pnfs/iihe/cms/store/user/odevroed/*.root /user/odevroed/
  • Via the dcache copy command (dccp):
dccp dcap://maite.iihe.ac.be/pnfs/iihe/cms/store/user/odevroed/DQMfile_83_1_hF2.root ./ 
  • To open files directly using root, use eg
root dcap://maite.iihe.ac.be/pnfs/iihe/some/file.root 
When reading out the root files, if is rather slow or it doesn't work at all, and nothing is wrong with the root file (e.g. in an interactive analysis mX machines) you can increase your dCache readahead buffer. Don't make the buffer larger than 50MB!
To enlarge the buffer set this in you environment:
For csh
setenv DCACHE_RAHEAD 1
setenv DCACHE_RA_BUFFER 50000000
For bash
export DCACHE_RAHEAD=true
export DCACHE_RA_BUFFER=50000000
  • Via the 'curl' command over https
Copy from /pnfs:
curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath $X509_CERT_DIR  -O https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/testing_transfer
Copy to /pnfs:
curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath $X509_CERT_DIR  -T testing_transfer  https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/testing_transfer_2
This is equivalent to issuing the gfal-cp command via the https protocol:
gfal-copy testing_transfer https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/testing_transfer2