- "Deploy a Pelican Origin" (README.md)
Accompanying slides: here
Jump to:
Access a notebook here: https://notebook.ospool.osg-htc.org/
Authenticate with one of the following:
- GitHub
- ORCID
- ACCESS ID
- Your local university credentials
Select the “Basic” server and click “Start”
Clone repository and then open the README.ipynb file.
Alternatively, the commands are listed below.
WORKDIR=$HOME/training-origin/pelican-plugin
echo $WORKDIR
This set of commands downloads a test data file (in a sequence data format) from the Open Science Data Federation.
cd $WORKDIR/data
OSDF=pelican://osg-htc.org
OBJ_PATH=ospool/uc-shared/public/osg-training/tutorial-fastqc/test.fastq
pelican object get $OSDF/$OBJ_PATH test.fastq
The following command should display the beginning of a genomic sequence file:
head test.fastq
cd $WORKDIR/sample
Look at the contents of the HTCondor job submit file below. There should be some familiar elements (resource requests, where to save stdout/stderr/log files, what commands to run) and some potentially new elements (transferring files).
cat sample.submit
condor_submit sample.submit
condor_q
cat job*.output
cat output*.txt
cd $WORKDIR/fastqc
ls -lh
We are now going to submit a slightly more complex job example. This job will fetch both the test.fastq
file from the OSDF that we used a minute ago, as well as a container with the fastQC
bioinformatics program.
grep "pelican" single-fastqc.submit
The job itself will run the FastQC program on the fetched data file and produce a visualization, which will get written back to the results
folder
cat single-fastqc.submit
condor_submit single-fastqc.submit
condor_q
ls results/
One of the script commands was an ls
so we can see that the test.fastq
was downloaded by looking at the standard output file.
cat logs/*.out
cd $WORKDIR/fastqc
Because the Pelican object links can be quite long, it's helpful to use intermediate variables in the submit file.
grep "OBJ_LOC" many-fastqc.submit
Finally, we'll run the same FastQC analysis, but with multiple data files (again, being fetched from the OSDF).
cat many-fastqc.submit
condor_submit many-fastqc.submit
condor_q
ls results/
cat logs/*.out
Let's go back to our sample directory and try to download a file from YOUR origin in a job!
cd $WORKDIR/sample
cat sample-origin.submit
condor_submit sample-origin.submit
condor_q