GridStorageAccess: Difference between revisions
No edit summary |
No edit summary |
||
Line 9: | Line 9: | ||
As dcache was designed for precious data, the files are immutable. This means that once they are written, they cannot be changed any more. So, if you want to make changes to a file on dcache, you need to first erase it and then write it anew. | As dcache was designed for precious data, the files are immutable. This means that once they are written, they cannot be changed any more. So, if you want to make changes to a file on dcache, you need to first erase it and then write it anew. | ||
Our dCache instance is mounted on all the M machines and can be browsed via the /pnfs directory. If you want to find your personal directory, the structure is the following: | Our dCache instance is mounted on all the M machines and can be browsed via the /pnfs directory. If you want to find your personal directory, the structure is the following: | ||
<pre>/pnfs/iihe/<Experiment>/store/user/<Username> | <pre>/pnfs/iihe/<Experiment>/store/user/<Username> <-- Replace <Username> and <Experiment> accordingly.</pre> | ||
On the M machines as well as the whole cluster, /pnfs is now mounted read-write (this is new!). Please be aware that you can now inadvertently remove a large portion of your files. As it is a mass storage system, you can easily delete several TBs of data. | On the M machines as well as the whole cluster, /pnfs is now mounted read-write (this is new!). Please be aware that you can now inadvertently remove a large portion of your files. As it is a mass storage system, you can easily delete several TBs of data. | ||
Line 24: | Line 23: | ||
The way authentication is done on our mass storage instance is via an x509 proxy. This proxy is made through your grid certificate. If you do not have a grid certificate, see [https://t2bwiki.iihe.ac.be/Getting_a_certificate_for_the_T2 this page] on how to get one.<br> | The way authentication is done on our mass storage instance is via an x509 proxy. This proxy is made through your grid certificate. If you do not have a grid certificate, see [https://t2bwiki.iihe.ac.be/Getting_a_certificate_for_the_T2 this page] on how to get one.<br> | ||
The command to make a grid proxy is: | The command to make a grid proxy is: | ||
<pre>voms-proxy-init --voms MYEXPERIMENT</pre> | <pre>voms-proxy-init --voms <MYEXPERIMENT></pre> | ||
Where MYEXPERIMENT is one of 'cms, icecube, beapps' | Where ''<MYEXPERIMENT>'' is one of 'cms, icecube, beapps' | ||
== Access via GFAL == | == Access via GFAL == | ||
Line 43: | Line 42: | ||
=== Usage === | === Usage === | ||
There are 2 types of file url: | There are 2 types of file url: | ||
* '''Distant files''': their url is of the type | * '''Distant files''': their url is of the type <protocol>://<name_of_server>:<port>/some/path, eg for IIHE: | ||
davs://maite.iihe.ac.be:2880/pnfs/iihe/ | |||
* '''Local files''': their url is of the type file://path_of_the_file, eg for IIHE: | * '''Local files''': their url is of the type file://path_of_the_file, eg for IIHE: | ||
file:///user/$USER/MyFile.root | file:///user/$USER/MyFile.root | ||
Line 51: | Line 50: | ||
*To get a list of all distant urls for all the Storage Elements, one can do: | *To get a list of all distant urls for all the Storage Elements, one can do: | ||
::<pre> lcg-infosites --vo cms se </pre> | ::<pre> lcg-infosites --is grid-bdii.desy.de --vo cms se </pre> | ||
=== Protocols === | === Protocols === | ||
Line 146: | Line 145: | ||
* Via the 'curl' command over https | * Via the 'curl' command over https | ||
<pre>Copy from /pnfs: | <pre>Copy from /pnfs: | ||
curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath $X509_CERT_DIR -O https://maite.iihe.ac.be:2880/cms/store/user/odevroed/testing_transfer | curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath $X509_CERT_DIR -O https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/testing_transfer | ||
Copy to /pnfs | Copy to /pnfs | ||
curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath $X509_CERT_DIR -T testing_transfer https://maite.iihe.ac.be:2880/cms/store/user/odevroed/testing_transfer_2 | curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath $X509_CERT_DIR -T testing_transfer https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/testing_transfer_2 | ||
This is equivalent to | This is equivalent to issuing the gfal-cp command via the https protocol: | ||
gfal-copy https://maite.iihe.ac.be:2880/cms/store/user/odevroed/testing_transfer ./ | gfal-copy https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/testing_transfer ./ | ||
</pre> | </pre> | ||
=== | === Browser access === | ||
As seen in the last example of the previous section, dCache files are now seen over https. <br> | As seen in the last example of the previous section, dCache files are now seen over https. <br> | ||
This means that the files are also accessible to browse over WebDav. Indeed, you can now point any browser to | This means that the files are also accessible to browse over WebDav. Indeed, you can now point any browser to | ||
https://maite.iihe.ac.be:2880/cms/store/user/ | |||
https://maite.iihe.ac.be:2880/cms/store/user/ | |||
To be able to see and download your files. | To be able to see and download your files. | ||
But, dCache has a more powerful access over https | But, dCache has a more powerful access over https. It is called dCache View and can be accessed via | ||
https://maite.iihe.ac.be:3880/ | |||
https://maite.iihe.ac.be:3880/ | |||
Revision as of 12:17, 8 December 2021
This page describes how to handle data stored on our mass storage system.
Introduction
T2B has an ever increasing amount of mass storage available. This system hosts both the centrally produced datasets as well as the user produced data.
The mass storage is managed by a software called dCache. As this software is in full development, new features are added continuously. This page contains an overview of the most used features of the software.
General info
As dcache was designed for precious data, the files are immutable. This means that once they are written, they cannot be changed any more. So, if you want to make changes to a file on dcache, you need to first erase it and then write it anew. Our dCache instance is mounted on all the M machines and can be browsed via the /pnfs directory. If you want to find your personal directory, the structure is the following:
/pnfs/iihe/<Experiment>/store/user/<Username> <-- Replace <Username> and <Experiment> accordingly.
On the M machines as well as the whole cluster, /pnfs is now mounted read-write (this is new!). Please be aware that you can now inadvertently remove a large portion of your files. As it is a mass storage system, you can easily delete several TBs of data.
Writing and deleting files can still be done via via grid enable commands (see next section). These should still provide the best read/write speeds compared to nfs access. These commands are mostly done via scripts, so the probability of an error is lessened.
There is a dedicated machine for file copying to /pnfs:
rw.iihe.ac.be
You can log into this machine like any other M-machine, and use the regular 'cp' and 'rm' commands. However, this machine is not like the other M-machines. Its sole purpose is to access /pnfs via the regular posix commands, so code development and job submission is prohibited.
Before starting
In what follows, all the commands will require some type of authentication to access /pnfs. This is because these commands can be executed over WAN and your location is irrelevant.
The way authentication is done on our mass storage instance is via an x509 proxy. This proxy is made through your grid certificate. If you do not have a grid certificate, see this page on how to get one.
The command to make a grid proxy is:
voms-proxy-init --voms <MYEXPERIMENT>
Where <MYEXPERIMENT> is one of 'cms, icecube, beapps'
Access via GFAL
GFAL is a wrapper around the latest grid commands. Learning to use it means that whatever middleware requires to be used in the future, you don't need to learn new commands (like srm, lcg, etc)
gfal-commands
If you want more information on the options that can be used, please use man gfal-command !
Here are all the commands that can be used:
- gfal-ls: get information on a file
- gfal-mkdir: remove a directory
- gfal-rm: removes a file. To remove an entire directory, use -r
- gfal-copy: copy files.
Usage
There are 2 types of file url:
- Distant files: their url is of the type <protocol>://<name_of_server>:<port>/some/path, eg for IIHE:
davs://maite.iihe.ac.be:2880/pnfs/iihe/
- Local files: their url is of the type file://path_of_the_file, eg for IIHE:
file:///user/$USER/MyFile.root
Be careful, the number of / is very -very- important
- To get a list of all distant urls for all the Storage Elements, one can do:
lcg-infosites --is grid-bdii.desy.de --vo cms se
Protocols
- https/WebDavs [preferred]
gfal-ls davs://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/
- xrootd [preferred]
gfal-ls root://maite.iihe.ac.be:1094/pnfs/iihe/cms/store/user/
- srm [deprecated]
gfal-ls srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/store/user/
- nfs {local-only | no-cert}
gfal-ls /pnfs/iihe/cms/store/user/
- dcap {local-only | no-cert}
gfal-ls dcap://maite.iihe.ac.be/pnfs/iihe/cms/store/user
Examples
- To list the contents of a directory /pnfs/iihe/cms :
gfal-ls srm://maite.iihe.ac.be:8443/pnfs/iihe/cms
- To create a directory:
gfal-mkdir srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/store/user/$USER/NewDir
- copy file from local disk to remote server
gfal-copy file:///user/$USER/MyFile.root srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/store/user/$USER/
- To copy a file from remote server to our Storage Element:
gfal-copy srm://srm-eoscms.cern.ch:8443/srm/v2/server?SFN=/eos/cms/store/group/comm_trigger/L1TrackTrigger/BE5D_620_SLHC6/singleMu/NoPU/reDIGI_SLHC6-TrackTrigger_muon_pgun-0499.root srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/store/user/odevroed/eosTransfer.root
- To delete a file on remote server
gfal-rm srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/store/user/$USER/MyFile.root
- To remove a directory and its entire content on remote server ?!? not working for now ?):
gfal-rm -r srm://maite.iihe.ac.be:8443/pnfs/iihe/cms/store/user/$USER/NewDir
Bulk file transfers
There is an elegant way to run gfal-copy through several files. This is done using the --from-file option.
Syntax:
gfal-copy -f --from-file files.txt file://$PWD
where files.txt is a file where every line is a source in srm url syntax.
Make some tests with one line in datafile and make sure the srm url is OK for both source and destination before running over several files.
Copy directories from and to pnfs within the IIHE
A script to copy full directories to and from pnfs exists on the slc6 UI's:
copyDirectoryPnfs.py Move all files in a directory to or from pnfs This script assumes that you copy within the IIHE The script does not do recursive copying Make sure you have a valid proxy, made with voms-proxy-init --voms cms:/cms/becms Mandatory options: --in= : directory to copy from --out= : directory to copy to Both directories need to be complete (i.e. including the /pnfs or /user part example: copyDirectoryPnfs.py --out=/user/odevroed/newfile --in=/pnfs/iihe/cms/store/user/odevroed/newdir Optional: -h, --help : print this help message
Other ways to access the mass storage system
Read and copy access
As stated in the introduction, dCache is an immutable file system, therefore files cannot be changed once they are written. Furthermore, for data protection, files are only available in read mode from /pnfs.
Files can be accessed from pnfs in several ways.
- Via the regular 'cp' command
cp /pnfs/iihe/cms/store/user/odevroed/DQMfile_83_1_hF2.root /user/odevroed
- Via the dcache copy command (dccp):
dccp dcap://maite.iihe.ac.be/pnfs/iihe/cms/store/user/odevroed/DQMfile_83_1_hF2.root ./
- To open files using root, use eg
root dcap://maite.iihe.ac.be/pnfs/iihe/some/file.root
When reading out the rootfiles is rather slow or it doesn't work at all and nothing is wrong with the root file (e.g. in an interactive analysis on beo or msa) you can increase your dCache readahead buffer. Don't make the buffer larger than 50MB! To enlarge the buffer set this in you environment :
For csh:
setenv DCACHE_RAHEAD 1 setenv DCACHE_RA_BUFFER 50000000
For bash:
export DCACHE_RAHEAD=true export DCACHE_RA_BUFFER=50000000
- Via the 'curl' command over https
Copy from /pnfs: curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath $X509_CERT_DIR -O https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/testing_transfer Copy to /pnfs curl -L --cert $X509_USER_PROXY --key $X509_USER_PROXY --cacert $X509_USER_PROXY --capath $X509_CERT_DIR -T testing_transfer https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/testing_transfer_2 This is equivalent to issuing the gfal-cp command via the https protocol: gfal-copy https://maite.iihe.ac.be:2880/pnfs/iihe/cms/store/user/odevroed/testing_transfer ./
Browser access
As seen in the last example of the previous section, dCache files are now seen over https.
This means that the files are also accessible to browse over WebDav. Indeed, you can now point any browser to
https://maite.iihe.ac.be:2880/cms/store/user/
To be able to see and download your files.
But, dCache has a more powerful access over https. It is called dCache View and can be accessed via
https://maite.iihe.ac.be:3880/