Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DEV_Methyl-Seq with latest from main branch #135

Merged
merged 82 commits into from
Jan 15, 2025
Merged
Changes from 8 commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
3f731d5
Create GL-DPPD-7110-A.md
asaravia-butler Apr 25, 2024
98ff3e3
Add files via upload
asaravia-butler Apr 25, 2024
b6d408a
Updating to point to pipeline version A
asaravia-butler Apr 25, 2024
9123f89
[GL_RefAnnotTable] Added rat links and annotation table
torres-alexis May 24, 2024
8d6f239
[GL_RefAnnotTable] Updated Reference annotations CSV
torres-alexis Jun 2, 2024
e27b4c7
[GL_RefAnnotTable] GL_RefAnnotTable-A 1.1.0
torres-alexis Jul 12, 2024
e22158d
[GL_RefAnnotTable] Initial microbes updates
torres-alexis Aug 4, 2024
f6154f7
[GL_RefAnnotTable] GL_RefAnnotTable-A 1.1.0
torres-alexis Aug 12, 2024
9b2b943
[GL_RefAnnotTable] Added makeOrgPackageFromNCBI to DPPD doc
torres-alexis Sep 3, 2024
0020c19
[GL_RefAnnotTable] Adjust DPPD doc
torres-alexis Sep 3, 2024
3dc8f2c
[GL_RefAnnotTable] Change input arg to full name
torres-alexis Sep 4, 2024
97d5fef
Merge pull request #110 from torres-alexis/DEV_GeneLab_Reference_Anno…
asaravia-butler Sep 5, 2024
eb93b05
Adding missing updates
asaravia-butler Sep 5, 2024
bbf7a78
Updating install and run instructions.
asaravia-butler Sep 5, 2024
7c011a2
[GL_RefAnnotTable] Misc fixes
torres-alexis Sep 6, 2024
9fd9fb7
[GL_RefAnnotTable] Typo fixes
torres-alexis Sep 6, 2024
8050e32
[GL_RefAnnotTable] Add database versions, fix go.db version
torres-alexis Sep 6, 2024
d910e55
[GL_RefAnnotTable] Add go.db info
torres-alexis Sep 6, 2024
81d06dd
[GL_RefAnnotTable] Move panther note line
torres-alexis Sep 6, 2024
c72d4bb
Merge pull request #118 from torres-alexis/DEV_GeneLab_Reference_Anno…
asaravia-butler Sep 11, 2024
cc11ff9
Specify DB versions used
asaravia-butler Sep 11, 2024
6299719
Input output updates, remove unnecessary variables
asaravia-butler Sep 11, 2024
8bae3a5
Removed target_species_designation variable
asaravia-butler Sep 11, 2024
c3f621b
[GL_RefAnnotTable] Typo fixes
torres-alexis Sep 11, 2024
65f04bd
Merge pull request #122 from torres-alexis/DEV_GeneLab_Reference_Anno…
asaravia-butler Sep 11, 2024
7228880
Typo fix
asaravia-butler Sep 16, 2024
2749fe5
[GL_RefAnnotTable] Fix R packages, add docker instructions
torres-alexis Sep 16, 2024
c72d704
[GL_RefAnnotTable] Typo fixes
torres-alexis Sep 16, 2024
fab25b4
[GL_RefAnnotTable] Typo fixes
torres-alexis Sep 16, 2024
3e3dec6
[GL_RefAnnotTable] Add Docker/Singularity, fix R lib
torres-alexis Sep 16, 2024
f9c4f03
[GL_RefAnnotTable] Update docker image
torres-alexis Sep 17, 2024
ffec6ae
[GL_RefAnnotTable] Typo fixes
torres-alexis Sep 17, 2024
c4acfad
[GL_RefAnnotTable] Readd comment
torres-alexis Sep 17, 2024
51570c4
[GL_RefAnnotTable] Typo fix
torres-alexis Sep 17, 2024
2bbe8e9
NF_MAAgilent1ch: address #96
cyouh95 Oct 1, 2024
9df6fe9
NF_MAAgilent1ch: address #97
cyouh95 Oct 1, 2024
22d59a6
NF_MAAgilent1ch: address #100
cyouh95 Oct 1, 2024
9039193
NF_MAAgilent1ch: address #99
cyouh95 Oct 1, 2024
40e3652
Merge pull request #123 from torres-alexis/DEV_GeneLab_Reference_Anno…
asaravia-butler Oct 1, 2024
4f181bf
Refactor instructions for singularity use
torres-alexis Oct 1, 2024
232e421
[GL_RefAnnotTable] add container + local instructions
torres-alexis Oct 2, 2024
99daa55
[GL_RefAnnotTable] Fix typos
torres-alexis Oct 2, 2024
12b0587
[GL_RefAnnotTable] Fix interactive install-org-db
torres-alexis Oct 2, 2024
ee00750
NF_MAAgilent1ch: #85 add processed data protocol
cyouh95 Oct 2, 2024
e1d711c
NF_MAAgilent1ch: update accepted ISA field name for label
cyouh95 Oct 2, 2024
b1104b9
NF_MAAgilent1ch: update workflow version from 1.0.3 to 1.0.4
cyouh95 Oct 2, 2024
20f9e54
Merge branch 'DEV_NF_MAAgilent_1ch' of github.com:cyouh95/GeneLab_Dat…
cyouh95 Oct 2, 2024
86814bd
[GL_RefAnnotTable] switch from apptainer to singularity
torres-alexis Oct 10, 2024
d4b1c09
fix typo
torres-alexis Oct 10, 2024
f839222
[GL_RefAnnotTable] fix .img image name
torres-alexis Oct 10, 2024
8383bb5
Merge pull request #124 from cyouh95/DEV_NF_MAAgilent_1ch
asaravia-butler Oct 22, 2024
4b3c7b3
Merge pull request #126 from nasa/DEV_NF_MAAgilent_1ch
asaravia-butler Oct 22, 2024
d8d8167
Typo fixes
asaravia-butler Oct 22, 2024
93694b4
Adding maintainer
asaravia-butler Oct 22, 2024
f1567b7
Typo fix
asaravia-butler Oct 22, 2024
1bec5f8
Adding maintainer
asaravia-butler Oct 22, 2024
0338c93
Merge pull request #125 from torres-alexis/DEV_GeneLab_Reference_Anno…
asaravia-butler Oct 22, 2024
bd917b4
Formatting updates
asaravia-butler Oct 22, 2024
499538d
Formatting updates
asaravia-butler Oct 22, 2024
75dd660
Typo and link fixes
asaravia-butler Oct 22, 2024
f54a529
Formatting updates
asaravia-butler Oct 23, 2024
f015225
remove lib path
torres-alexis Oct 23, 2024
fce6a73
Update GL-DPPD-7110-A_build-genome-annots-tab.R
torres-alexis Oct 23, 2024
3cc61cf
Add possible paths to install-org-db execution function
torres-alexis Oct 23, 2024
8619121
Update GL-DPPD-7110-A_build-genome-annots-tab.R
torres-alexis Oct 23, 2024
8bbf66d
Update GL-DPPD-7110-A.md
torres-alexis Oct 23, 2024
4bec193
Update GL-DPPD-7110-A_build-genome-annots-tab.R
torres-alexis Oct 24, 2024
6da57fa
Updating signature matrix
asaravia-butler Oct 24, 2024
e3dfb4b
Formatting updates
asaravia-butler Oct 24, 2024
d1ea649
remove custom org dbs from annotation table
torres-alexis Oct 29, 2024
f06cf14
move timeout to top of scripts, add to readme
torres-alexis Oct 30, 2024
5088539
add no-home + bind local path to same container path
torres-alexis Oct 30, 2024
6368381
add cols bioconductor_annotations, custom_annotations, change dppd va…
torres-alexis Oct 31, 2024
b39c63c
remove --no-home from readme
torres-alexis Oct 31, 2024
dcdf589
Add r_libs to scrips, readme, standardize notes
torres-alexis Oct 31, 2024
ef2958f
Merge pull request #128 from torres-alexis/DEV_GeneLab_Reference_Anno…
bnovak32 Nov 5, 2024
283c0ab
Update README.md
bnovak32 Nov 8, 2024
c4b100e
Update README.md
bnovak32 Nov 8, 2024
d4e2d54
Merge pull request #130 from nasa/DEV_Methyl-Seq
asaravia-butler Nov 8, 2024
df3f216
Merge pull request #131 from nasa/DEV_GeneLab_Reference_Annotations_v…
asaravia-butler Nov 8, 2024
bdf8cfe
Fix broken reference database links
asaravia-butler Dec 5, 2024
810fc7a
Add updates for v1.2.3
asaravia-butler Dec 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,155 +1,195 @@
# GL_RefAnnotTable Workflow Information and Usage Instructions
# GL_RefAnnotTable-A Workflow Information and Usage Instructions <!-- omit in toc -->

## General workflow info
The current GeneLab Reference Annotation Table (GL_RefAnnotTable-A) pipeline is implemented as an R workflow that can be run from a command line interface (CLI) using bash. The workflow can be used even if you are unfamiliar with R, but if you want to learn more about R, visit the [R-project about page here](https://www.r-project.org/about.html). Additionally, an introduction to R along with installation help and information about using R for bioinformatics can be found [here at Happy Belly Bioinformatics](https://astrobiomike.github.io/R/basics).
## Table of Contents <!-- omit in toc -->

## Utilizing the workflow
- [General Workflow Information](#general-workflow-information)
- [Utilizing the Workflow](#utilizing-the-workflow)
- [1. Download the Workflow Files](#1-download-the-workflow-files)
- [2. Run the Workflow](#2-run-the-workflow)
- [Approach 1: Using Singularity](#approach-1-using-singularity)
- [Step 1: Install Singularity](#step-1-install-singularity)
- [Step 2: Fetch the Singularity Image](#step-2-fetch-the-singularity-image)
- [Step 3: Run the Workflow](#step-3-run-the-workflow)
- [Step 4: Run the Annotations Database Creation Function as a Stand-Alone Script](#step-4-run-the-annotations-database-creation-function-as-a-stand-alone-script)
- [Approach 2: Using a Local R Environment](#approach-2-using-a-local-r-environment)
- [Step 1: Install R and Required R Packages](#step-1-install-r-and-required-r-packages)
- [Step 2: Run the Workflow](#step-2-run-the-workflow)
- [Step 3: Run the Annotations Database Creation Function as a Stand-Alone Script](#step-3-run-the-annotations-database-creation-function-as-a-stand-alone-script)

1. [Install R and R packages](#1-install-r-and-r-packages)
2. [Download the workflow files](#2-download-the-workflow-files)
3. [Setup Execution Permission for Workflow Scripts](#3-setup-execution-permission-for-workflow-scripts)
4. [Run the workflow](#4-run-the-workflow)
5. [Run the annotations database creation function as a stand-alone script](#5-run-the-annotations-database-creation-function-as-a-stand-alone-script)
6. [Run the Workflow Using Docker or Singularity](#6-run-the-workflow-using-docker-or-singularity)
<br>
---

### 1. Install R and R packages
## General Workflow Information

We recommend installing R via the [Comprehensive R Archive Network (CRAN)](https://cran.r-project.org/) as follows:
The current GeneLab Reference Annotation Table (GL_RefAnnotTable-A) pipeline is implemented as an R workflow that can be run from a command line interface (CLI) using bash. The workflow can be executed using either a Singularity container or a local R environment. The workflow can be used even if you are unfamiliar with R, but if you want to learn more about R, visit the [R-project about page here](https://www.r-project.org/about.html). Additionally, an introduction to R along with installation help and information about using R for bioinformatics can be found [here at Happy Belly Bioinformatics](https://astrobiomike.github.io/R/basics).

1. Select the [CRAN Mirror](https://cran.r-project.org/mirrors.html) closest to your location.
2. Click the link under the "Download and Install R" section that's consistent with your machine.
3. Click on the R-4.4.0 package consistent with your machine to download.
4. Double click on the R-4.4.0.pkg downloaded in step 3 and follow the installation instructions.
---

## Utilizing the Workflow

To utilize the GL_RefAnnotTable-A workflow, follow the instructions below to download the necessary workflow files. Once downloaded, the workflow can be executed using two approaches:

1. **[Using Singularity](#approach-1-using-singularity)**
2. **[Using a Local R Environment](#approach-2-using-a-local-r-environment)**

Please follow the instructions for the approach that best matches your setup and preferences. Each method is explained in detail below.

Once R is installed, open a CLI terminal and run the following command to activate R:
---

### 1. Download the Workflow Files

Download the latest version of the GL_RefAnnotTable-A workflow:

```bash
R
curl -LO https://github.com/nasa/GeneLab_Data_Processing/releases/download/GL_RefAnnotTable-A_1.1.0/GL_RefAnnotTable-A_1.1.0.zip
unzip GL_RefAnnotTable-A_1.1.0.zip
```
`
Within an active R environment, run the following commands to install the required R packages:

```R
install.packages("tidyverse")
---

install.packages("BiocManager")
### 2. Run the Workflow

BiocManager::install("STRINGdb")
BiocManager::install("PANTHER.db")
BiocManager::install("rtracklayer")
BiocManager::install("AnnotationForge")
BiocManager::install("biomaRt")
BiocManager::install("GO.db")
```
The GL_RefAnnotTable-A workflow can be run using two approaches:

<br>
- **[Approach 1: Using Singularity](#approach-1-using-singularity)**
- **[Approach 2: Using a Local R Environment](#approach-2-using-a-local-r-environment)**

### 2. Download the Workflow Files
---

All files required for utilizing the GL_RefAnnotTable-A workflow for generating reference annotation tables are in the [workflow_code](workflow_code) directory. To get a copy of latest GL_RefAnnotTable version on to your system, run the following command:
#### Approach 1: Using Singularity

```bash
curl -LO https://github.com/nasa/GeneLab_Data_Processing/releases/download/GL_RefAnnotTable-A_1.1.0/GL_RefAnnotTable-A_1.1.0.zip
```
This approach allows you to run the workflow within a containerized environment, ensuring consistency and reproducibility.

##### Step 1: Install Singularity

Singularity is a containerization platform for running applications portably and reproducibly. We use container images hosted on Quay.io to encapsulate all the necessary software and dependencies required by the GL_RefAnnotTable-A workflow. This setup allows you to run the workflow without installing any software directly on your system. Other containerization tools like Docker or Apptainer can also be used to pull and run these images.

We recommend installing Singularity system-wide as per the official [Singularity installation documentation](https://docs.sylabs.io/guides/3.10/admin-guide/admin_quickstart.html).

> **Note**: While Singularity is also available through [Anaconda](https://anaconda.org/conda-forge/singularity), we recommend installing Singularity system-wide following the official installation documentation.

<br>
##### Step 2: Fetch the Singularity Image

### 3. Setup Execution Permission for Workflow Scripts
To pull the Singularity image needed for the workflow, you can use the provided script as directed below or pull the image directly.

Once you've downloaded the GL_RefAnnotTable-A workflow directory as a zip file, unzip the workflow then `cd` into the GL_RefAnnotTable-A_1.1.0 directory on the CLI. Next, run the following command to set the execution permissions for the R script:
> **Note**: This command should be run in the location containing the `GL_RefAnnotTable-A_1.1.0` directory that was downloaded in [step 1](#1-download-the-workflow-files). Depending on your network speed, fetching the images will take approximately 20 minutes.

```bash
unzip GL_RefAnnotTable-A_1.1.0.zip
cd GL_RefAnnotTable-A_1.1.0
chmod -R u+x *R
bash GL_RefAnnotTable-A_1.1.0/bin/prepull_singularity.sh GL_RefAnnotTable-A_1.1.0/config/software/by_docker_image.config
```

<br>
Once complete, a `singularity` folder containing the Singularity images will be created. Run the following command to export this folder as an environment variable:

```bash
export SINGULARITY_CACHEDIR=$(pwd)/singularity
```

### 4. Run the Workflow
##### Step 3: Run the Workflow

While in the GL_RefAnnotTable workflow directory, you are now able to run the workflow. Below is an example of how to run the workflow to build an annotation table for Mus musculus (mouse):
While in the directory containing the `GL_RefAnnotTable-A_1.1.0` folder, you can now run the workflow. Below is an example for generating the annotation table for *Mus musculus* (mouse):

```bash
Rscript GL-DPPD-7110-A_build-genome-annots-tab.R 'Mus musculus'
singularity exec -B $(pwd)/GL_RefAnnotTable-A_1.1.0:/work \
$SINGULARITY_CACHEDIR/quay.io-nasa_genelab-gl-refannottable-a-1.1.0.img \
Rscript /work/GL-DPPD-7110-A_build-genome-annots-tab.R 'Mus musculus'
```

**Input data:**

- No input files are required. Specify the target organism using a positional command line argument. `Mus musculus` is used in the example above. To see a list of all available organisms, run `Rscript GL-DPPD-7110-A_build-genome-annots-tab.R` without positional arguments. The correct argument for each organism can also be found in the 'species' column of the [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)

- Optional: a reference table CSV can be supplied as a second positional argument instead of using the default [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)

**Output data:**

- *-GL-annotations.tsv (Tab delineated table of gene annotations)
- *-GL-build-info.txt (Text file containing information used to create the annotation table, including tool and tool versions and date of creation)

### 5. Run the annotations database creation function as a stand-alone script
##### Step 4: Run the Annotations Database Creation Function as a Stand-Alone Script

When the workflow is run, if the reference table does not specify an annotations database for the target_organism in the `annotations` column, the `install_annotations` function, defined in the `install-org-db.R` script, will be executed. This script will locally create and install an annotations database R package using AnnotationForge. This function can also be run as a stand-alone script from the command line:
If the reference table does not specify an annotations database for the target organism in the 'annotations' column, the `install_annotations` function (defined in `install-org-db.R`) will be executed. This function can also be run as a stand-alone script:

```bash
Rscript install-org-db.R 'Bacillus subtilis' /path/to/GL-DPPD-7110-A_annotations.csv
singularity exec -B $(pwd)/GL_RefAnnotTable-A_1.1.0:/work \
$SINGULARITY_CACHEDIR/quay.io-nasa_genelab-gl-refannottable-a-1.1.0.img \
Rscript /work/install-org-db.R 'Bacillus subtilis'
```

**Input data:**

- The target organism must be specified as the first positional command line argument, `Bacillus subtilis` is used in the example above. The correct argument for each organism can be found in the 'species' column of the [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)

- The path to a local reference table must also be supplied as the second positional argument
- The target organism must be specified as the first positional command line argument. `Bacillus subtilis` is used in the example above. The correct argument for each organism can be found in the 'species' column of [GL-DPPD-7110-A_annotations.csv](https://raw.githubusercontent.com/nasa/GeneLab_Data_Processing/master/GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
- Optional: A local reference table can be supplied as a second positional argument. If not provided, the script will download the current version of GL-DPPD-7110-A_annotations.csv from Github by default.

**Output data:**

- org.*.eg.db/ (species-specific annotation database, as a local R package)
- org.*.eg.db/ (Species-specific annotation database, as a local R package)

### 6. Run the Workflow Using Docker or Singularity
---

Rather than running the workflow in your local environment, you can use a Docker or Singularity container. This method ensures that all dependencies are correctly installed.
#### Approach 2: Using a Local R Environment

1. **Pull the container image:**
This approach allows you to run the workflow directly in your local R environment without using containers.

Docker:
```bash
docker pull quay.io/nasa_genelab/gl-refannottable:v1.0.0
```
##### Step 1: Install R and Required R Packages

Singularity:
```bash
singularity pull docker://quay.io/nasa_genelab/gl-refannottable:v1.0.0
```
We recommend installing R via the [Comprehensive R Archive Network (CRAN)](https://cran.r-project.org/):

2. **Download the workflow files:**
1. Select the [CRAN Mirror](https://cran.r-project.org/mirrors.html) closest to your location.
2. Navigate to the download page for your operating system.
3. Download and install R (e.g., R-4.4.0).

```bash
curl -LO https://github.com/nasa/GeneLab_Data_Processing/releases/download/GL_RefAnnotTable-A_1.1.0/GL_RefAnnotTable-A_1.1.0.zip
unzip GL_RefAnnotTable-A_1.1.0.zip
```
Once R is installed, you need to install the required R packages.

3. **Run the workflow:**
Open a terminal and start R:

Docker:
```bash
docker run -it -v $(pwd)/GL_RefAnnotTable-A_1.1.0:/work \
quay.io/nasa_genelab/gl-refannottable:v1.0.0 \
bash -c "cd /work && Rscript GL-DPPD-7110-A_build-genome-annots-tab.R 'Mus musculus'"
```
```bash
R
```

Singularity:
```bash
singularity exec -B $(pwd)/GL_RefAnnotTable-A_1.1.0:/work \
gl-refannottable_v1.0.0.sif \
bash -c "cd /work && Rscript GL-DPPD-7110-A_build-genome-annots-tab.R 'Mus musculus'"
```
Within the R environment, run the following commands to install the required packages:

```R
install.packages("tidyverse")
install.packages("BiocManager")
BiocManager::install("STRINGdb")
BiocManager::install("PANTHER.db")
BiocManager::install("rtracklayer")
BiocManager::install("AnnotationForge")
BiocManager::install("biomaRt")
BiocManager::install("GO.db")
```

##### Step 2: Run the Workflow

While in the directory containing the `GL_RefAnnotTable-A_1.1.0` folder, you can now run the workflow. Below is an example of how to run the workflow to build an annotation table for *Mus musculus* (mouse):

```bash
Rscript GL_RefAnnotTable-A_1.1.0/GL-DPPD-7110-A_build-genome-annots-tab.R 'Mus musculus'
```

**Input data:**

- No input files are required. Specify the target organism using a positional command line argument. `Mus musculus` is used in the example above. To see a list of all available organisms, run `Rscript GL-DPPD-7110-A_build-genome-annots-tab.R` without positional arguments. The correct argument for each organism can also be found in the 'species' column of the [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)

- Optional: a reference table CSV can be supplied as a second positional argument instead of using the default [GL-DPPD-7110-A_annotations.csv](../../Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)

**Output data:**

- *-GL-annotations.tsv (Tab delineated table of gene annotations)
- *-GL-build-info.txt (Text file containing information used to create the annotation table, including tool and tool versions and date of creation)

##### Step 3: Run the Annotations Database Creation Function as a Stand-Alone Script

If the reference table does not specify an annotations database for the target organism in the 'annotations' column, the `install_annotations` function (defined in `install-org-db.R`) will be executed. This function can also be run as a stand-alone script:

```bash
Rscript GL_RefAnnotTable-A_1.1.0/install-org-db.R 'Bacillus subtilis'
```

**Input data:**

- The target organism must be specified as the first positional command line argument. `Bacillus subtilis` is used in the example above. The correct argument for each organism can be found in the 'species' column of [GL-DPPD-7110-A_annotations.csv](https://raw.githubusercontent.com/nasa/GeneLab_Data_Processing/master/GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv)
- Optional: A local reference table can be supplied as a second positional argument. If not provided, the script will download the current version of GL-DPPD-7110-A_annotations.csv from Github by default.

**Output data:**

- org.*.eg.db/ (species-specific annotation database, as a local R package)

---
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@

#!/usr/bin/env bash

# Addresses issue: https://github.com/nextflow-io/nextflow/issues/1210

CONFILE=${1:-nextflow.config}
OUTDIR=${2:-./singularity}

if [ ! -e $CONFILE ]; then
echo "$CONFILE does not exist"
exit
fi

TMPFILE=`mktemp`

CURDIR=$(pwd)

mkdir -p $OUTDIR

cat ${CONFILE}|grep 'container'|perl -lane 'if ( $_=~/container\s*\=\s*\"(\S+)\"/ ) { $_=~/container\s*\=\s*\"(\S+)\"/; print $1 unless ( $1=~/^\s*$/ or $1=~/\.sif/ or $1=~/\.img/ ) ; }' > $TMPFILE

cd ${OUTDIR}

while IFS= read -r line; do
name=$line
name=${name/:/-}
name=${name//\//-}
echo $name
singularity pull ${name}.img docker://$line
done < $TMPFILE

cd $CURDIR
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
// Config that specifies containers for nextflow processes
process {
withName: 'GL_REFANNOTTABLE_A' {
container = "quay.io/nasa_genelab/gl-refannottable-a:1.1.0"
}
}
Original file line number Diff line number Diff line change
@@ -1,14 +1,31 @@
# install-org-db.R

# Set R library path to current working directory
lib_path <- file.path(getwd())
.libPaths(lib_path)

# Load required libraries
library(tidyverse)
library(AnnotationForge)
library(BiocManager)

# Function: Get annotations db from ref table. If no annotations db is defined, create the package name from genus, species, (and strain for microbes),
# Try to Bioconductor install annotations db. If fail then build the package using AnnotationForge, install it into the current directory.
# Requires ~80GB for NCBIFilesDir file caching
install_annotations <- function(target_organism, refTablePath) {
if (!file.exists(refTablePath)) {
stop("Reference table file does not exist at the specified path: ", refTablePath)
}
install_annotations <- function(target_organism, refTablePath = NULL) {
# Default URL for the specific version of the reference CSV
default_url <- "https://raw.githubusercontent.com/nasa/GeneLab_Data_Processing/master/GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv"

# Use the provided path if available, otherwise use the default URL
csv_source <- ifelse(is.null(refTablePath), default_url, refTablePath)

# Attempt to read the CSV file
ref_table <- tryCatch({
read.csv(csv_source)
}, error = function(e) {
stop("Failed to read the reference table: ", e$message)
})

ref_table <- read.csv(refTablePath)
target_taxid <- ref_table %>%
filter(species == target_organism) %>%
pull(taxon)
@@ -52,6 +69,7 @@ install_annotations <- function(target_organism, refTablePath) {
} else {
cat(paste0("\nAttempting to install '", target_org_db, "' from Bioconductor...\n"))
BiocManager::install(target_org_db, ask = FALSE)

if (requireNamespace(target_org_db, quietly = TRUE)) {
cat(paste0("'", target_org_db, "' has been successfully installed from Bioconductor.\n"))
} else {
@@ -85,3 +103,17 @@ install_annotations <- function(target_organism, refTablePath) {
cat(paste0("Using Annotation Database '", target_org_db, "'.\n"))
return(target_org_db)
}

if (!interactive()) {
# Parse command line arguments
args <- commandArgs(trailingOnly = TRUE)

if (length(args) < 1) {
stop("Usage: Rscript install-org-db.R <target_organism> [refTablePath]")
}

target_organism <- args[1]
refTablePath <- if (length(args) > 1) args[2] else NULL

install_annotations(target_organism, refTablePath)
}