Skip to content

Commit

Permalink
add docs and script to fetch rucio dataset files
Browse files Browse the repository at this point in the history
  • Loading branch information
garciagenrique committed Jun 20, 2024
1 parent eef00cd commit ef20ca2
Show file tree
Hide file tree
Showing 2 changed files with 50 additions and 1 deletion.
17 changes: 16 additions & 1 deletion tutorials/data-lake/pull-dataset/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,16 @@
# Pull dataset from Rucio data lake
# Interact with Rucio dataset files

The following script assumes that all the files within a DID are present in a RSE (Rucio Storage Element), and that this RSE is accessible locally.
- DIDs are composed of a scope plus a dataset name in the `SCOPE:DataSet` format.
- If the files are not present in the RSE, replicate the dataset on the desired RSE before running the script.

Run the following bash script

```bash
> ./rucio_dataset_files.sh <SCOPE:DataSet> <output_file> <output_symlink_dir>

# Example
> ./rucio_dataset_files.sh calorimeter:training_data_hdf5 calorimeter_files.txt calorimeter_symlink_dir
> cat calorimeter_files.txt
> ls -l calorimeter_symlink_dir
```
34 changes: 34 additions & 0 deletions tutorials/data-lake/pull-dataset/rucio_dataset_files.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#/bin/bash
#
# G. Guerrieri & E. Garcia (CERN) - Jun 2024
#
# This script runs only on VEGA
#
# Usage - on a terminal run
# > ./rucio_dataset_files.sh <SCOPE:DataSet> <output_file> <output_symlink_dir>

set -e

ds=$1
name=$2
location=$3

pw=`pwd -P`

if [[ -f "${name}" ]]; then rm ${name}.txt; fi
touch ${name}.txt

if [ -d "${location}" ]; then echo -e "Directory exists. Exiting\n${pw}/${location}" ; exit 1 ; fi
mkdir $location

for file in `rucio list-file-replicas --rse VEGA-DCACHE $ds | awk '{ print $12 }' | sed 's|https://dcache.sling.si:2880|/dcache/sling.si|g'`
do
if [[ $file == "|" ]]; then continue; fi
fileReduced=`basename $file`
echo linking $fileReduced "..."
link=$location/${ds/:/.}.$fileReduced
ln -s $file $link
echo ${pw}/$link >> ${name}.txt
done

chmod -R 777 $3

0 comments on commit ef20ca2

Please sign in to comment.