Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates of experimental scRNA-seq data source and analysis #123

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added Algorithms/SCORPION/.Rhistory
Empty file.
25 changes: 25 additions & 0 deletions Algorithms/SCORPION/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
FROM r-base:4.2.0

LABEL maintainer = "Daniel Osorio <[email protected]>"

USER root

WORKDIR /

RUN R -e "install.packages('https://cran.r-project.org/src/contrib/remotes_2.4.2.tar.gz', type = 'source')"

RUN R -e "install.packages('reshape2')"

# RUN R -e "remotes::install_github('kuijjerlab/SCORPION')"

RUN R -e "install.packages('SCORPION')"

RUN R -e "library(reshape2)"

RUN R -e "library(SCORPION)"

COPY runSCORPION.R /

RUN mkdir data/

RUN apt-get update && apt-get install -y time
74 changes: 74 additions & 0 deletions Algorithms/SCORPION/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
*This README.md file was generated on 2/4/2023 by Yiqi Su ([email protected])*

**We would like to acknowledge professors Daniel Osorio, S. Stephen Yi and Marieke L. Kuijjer for sharing the code for SCORPION.**

<!-- remove all comments (like this) before final save -->

# SCORPION: Single-Cell Oriented Reconstruction of PANDA (https://sites.google.com/a/channing.harvard.edu/kimberlyglass/tools/panda) Individually Optimized Gene Regulatory Network

This is the instruction on how to integrate the new GRN method SCORPION ([[Paper](https://doi.org/10.1101/2023.01.20.524974)] [[GitHub](https://github.com/kuijjerlab/SCORPION)]) to BEELINE.
Please follow the following steps:

1. **Create SCORPION folder:** Create a folder called SCORPION under Beeline/Algorithms for the new method to ensure easy set-up and portability and avoid conflicting libraries/software versions that may arise from the GRN algorithm implmentations.

2. **Create runSCORPION.py script:** In the SCORPION folder, create an R script runSCORPION.r to learn graphs from target datasets.

3. **Create a Dockerfile:** Create a "Dockerfile" that contains necessary software specifications and commands listed in a specific order from top to bottom.


FROM r-base:4.2.0

LABEL maintainer = "Daniel Osorio <[email protected]>"

USER root

WORKDIR /

RUN R -e "install.packages('https://cran.r-project.org/src/contrib/remotes_2.4.2.tar.gz', type = 'source')"

RUN R -e "install.packages('reshape2')"

RUN R -e "remotes::install_github('kuijjerlab/SCORPION')"

RUN R -e "library(SCORPION)"

RUN R -e "library(reshape2)"

COPY runSCORPION.R /

RUN mkdir data/

RUN apt-get update && apt-get install time

The Dockerfile will run the script runSCORPION.py within the Docker container.

4. **Add the Dockerfile to initialize.sh script:** Once the Dockerfile is ready, add the following lines to 'initialize.sh' script to create Docker image for scorpion.


cd $BASEDIR/Algorithms/SCORPION/
docker build -q -t scorpion:base .
if ([[ "$(docker images -q scorpion:base 2> /dev/null)" != "" ]]); then
echo "Docker container for SCORPION is built and tagged as scorpion:base"
else
echo "Oops! Unable to build Docker container for SCORPION"
fi

5. **Create scorpionRunner.py script:** After buliding the Docker image, create a Python script called scorpionRunner.py in Beeline/BLRun folder to setup a BLRun object so that it is able to read inputs and run scorpion inside the Docker image, and also parse the output for evaluation. Specifically, the scorpionRunner.py script contains three functions:

- ``generateInputs()`` : This function reads the input data file (i.e., expression data), and processes it into the format required by SCORPION.
- ``run()`` : This function constructs a "docker run" system command with parameters including the path of the input data file (i.e., expression data). It also specifies where the outputs are written. The docker container runs SCORPION when the parameters are passed.
- ``parseOutput()`` : This function reads the SCORPION-specific output (i.e., outFile.txt) and formats it into a ranked edgelist comma-separated file (i.e., rankedEdges.csv) with columns Gene1, Gene2, and EdgeWeight. The Gene1 column should contain regulators, the Gene2 column the targets, and EdgeWeight column the absolute value of the weight predicted for edge (regulator,target). The ranked edgelist file will be subsequently used by BLEval object.

6. **Add SCORPION to runner.py:** Next, update runner.py script in Beeline/BLRun folder by adding information related to SCORPION.

- add "import BLRun.scorpionRunner as SCORPION"
- add "'SCORPION':SCORPION.generateInputs" to InputMapper
- add "'SCORPION':SCORPION.run" to AlgorithmMapper
- add "'SCORPION':SCORPION.parseOutput" to OutputParser

7. **Add SCORPION to config.yaml:** The final step is to add the new algorithm SCORPION and any necessary parameters to the config.yaml located in Beeline/config-files folder. Note that currently BEELINE can only handle one parameter set at a time eventhough multiple parameters can be passed onto the single parameter object.


- name: "SCORPION"
params:
should_run: [True]
52 changes: 52 additions & 0 deletions Algorithms/SCORPION/runSCORPION.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
library(SCORPION)
library(reshape2)
args <- commandArgs(trailingOnly = T)
inFile <- args[1]
outFile <- args[2]

# input expression data
inputExpr <- read.table(inFile, sep=",", header = 1, row.names = 1)
geneNames <- rownames(inputExpr)
rownames(inputExpr) <- c(geneNames)
inputExpr <- as.matrix(inputExpr)
#nGenes <- nrow(inputExpr)

# Run SCORPION

nGenes <- nrow(inputExpr)
n.pc <- min(5, nGenes - 1) # Ensure n.pc is less than number of genes

# X <- SCORPION:::makeSuperCells(inputExpr, n.pc = n.pc)
# Example adjustment for choosing between irlba and svd
# if (n.pc / nGenes > 0.1) {
# # Use SVD
# svd_results <- svd(inputExpr)
# X <- svd_results$u[, 1:n.pc] %*% diag(svd_results$d[1:n.pc])
# } else {
# # Use SCORPION makeSuperCells
# X <- SCORPION:::makeSuperCells(inputExpr, n.pc = n.pc)
# }
if (n.pc / nGenes > 0.1) {
# Use SVD
svd_results <- svd(inputExpr)
X <- svd_results$u[, 1:n.pc] %*% diag(svd_results$d[1:n.pc])
# Retain the original row names
rownames(X) <- rownames(inputExpr)
} else {
# Use SCORPION makeSuperCells
X <- SCORPION:::makeSuperCells(inputExpr, n.pc = n.pc)
# Assuming makeSuperCells retains row names, if not, add them similarly
rownames(X) <- rownames(inputExpr)
}


X <- cor(t(as.matrix(X)), method = 'sp')
# Write output to a file
# https://stackoverflow.com/questions/38664241/ranking-and-counting-matrix-elements-in-r
DF = melt(X)

#DF = data.frame(Gene1 = geneNames[c(row(pcorResults$estimate))], Gene2 = geneNames[c(col(pcorResults$estimate))]
# , corVal = c(pcorResults$estimate), pValue = c(pcorResults$p.value))
colnames(DF) = c('Gene1', 'Gene2', 'corVal')
outDF <- DF[order(DF$corVal, decreasing=TRUE), ]
write.table(outDF, outFile, sep = "\t", quote = FALSE, row.names = FALSE)
Binary file added Algorithms/SCORPION/scorpionTest.RData
Binary file not shown.
32 changes: 32 additions & 0 deletions Algorithms/SCTENIFOLDNET/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
FROM r-base:4.2.0

LABEL maintainer = "Daniel Osorio <[email protected]>"

USER root

WORKDIR /

# RUN R -e "install.packages('https://cran.r-project.org/src/contrib/remotes_2.4.2.tar.gz', type = 'source')"

RUN R -e "install.packages('remotes')"

RUN R -e "library(remotes)"

RUN R -e "remotes::install_cran(pkgs = 'scTenifoldNet', quiet = TRUE)"

RUN R -e "remotes::install_cran(pkgs = 'reshape2', quiet = TRUE)"

# RUN R -e "install.packages('scTenifoldNet')"

# RUN R -e "install.packages('reshape2')"

RUN R -e "library(scTenifoldNet)"

RUN R -e "library(reshape2)"

COPY runSCTENIFOLDNET.R /

RUN mkdir data/

RUN apt-get update && apt-get install -y time

72 changes: 72 additions & 0 deletions Algorithms/SCTENIFOLDNET/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
*This README.md file was generated on 2/20/2023 by Yiqi Su ([email protected])*

**We would like to acknowledge professor Daniel Osorio for sharing the code for SCTENIFOLDNET.**

<!-- remove all comments (like this) before final save -->

# scTenifoldNet: A Machine Learning Workflow for Constructing and Comparing Transcriptome-wide Gene Regulatory Networks from Single-Cell Data

This is the instruction on how to integrate the new GRN method SCTENIFOLDNET ([[Paper](https://doi.org/10.1016/j.patter.2020.100139)] [[GitHub](https://github.com/jamesjcai/ScTenifoldNet.jl)]) to BEELINE.
Please follow the following steps:

1. **Create SCTENIFOLDNET folder:** Create a folder called SCTENIFOLDNET under Beeline/Algorithms for the new method to ensure easy set-up and portability and avoid conflicting libraries/software versions that may arise from the GRN algorithm implmentations.

2. **Create runSCTENIFOLDNET.py script:** In the SCTENIFOLDNET folder, create an R script runSCTENIFOLDNET.r to learn graphs from target datasets.

3. **Create a Dockerfile:** Create a "Dockerfile" that contains necessary software specifications and commands listed in a specific order from top to bottom.

FROM r-base:4.0.2

LABEL maintainer = "Daniel Osorio <[email protected]>"

USER root

WORKDIR /

RUN R -e "install.packages('https://cran.r-project.org/src/contrib/remotes_2.4.2.tar.gz', type = 'source')"

RUN R -e "remotes::install_cran(pkgs = 'scTenifoldNet', quiet = TRUE)"

RUN R -e "remotes::install_cran(pkgs = 'reshape2', quiet = TRUE)"

RUN R -e "library(scTenifoldNet)"

RUN R -e "library(reshape2)"

COPY runSCTENIFOLDNET.R /

RUN mkdir data/

RUN apt-get update && apt-get install -y time


The Dockerfile will run the script runSCTENIFOLDNET.py within the Docker container.

4. **Add the Dockerfile to initialize.sh script:** Once the Dockerfile is ready, add the following lines to 'initialize.sh' script to create Docker image for sctenifoldnet.

cd $BASEDIR/Algorithms/SCTENIFOLDNET/
docker build -q -t sctenifoldnet:base .
if ([[ "$(docker images -q sctenifoldnet:base 2> /dev/null)" != "" ]]); then
echo "Docker container for SCTENIFOLDNET is built and tagged as sctenifoldnet:base"
else
echo "Oops! Unable to build Docker container for SCTENIFOLDNET"
fi

5. **Create sctenifoldnetRunner.py script:** After buliding the Docker image, create a Python script called sctenifoldnetRunner.py in Beeline/BLRun folder to setup a BLRun object so that it is able to read inputs and run sctenifoldnet inside the Docker image, and also parse the output for evaluation. Specifically, the sctenifoldnetRunner.py script contains three functions:

- ``generateInputs()`` : This function reads the input data file (i.e., expression data), and processes it into the format required by SCTENIFOLDNET.
- ``run()`` : This function constructs a "docker run" system command with parameters including the path of the input data file (i.e., expression data). It also specifies where the outputs are written. The docker container runs SCTENIFOLDNET when the parameters are passed.
- ``parseOutput()`` : This function reads the SCTENIFOLDNET-specific output (i.e., outFile.txt) and formats it into a ranked edgelist comma-separated file (i.e., rankedEdges.csv) with columns Gene1, Gene2, and EdgeWeight. The Gene1 column should contain regulators, the Gene2 column the targets, and EdgeWeight column the absolute value of the weight predicted for edge (regulator,target). The ranked edgelist file will be subsequently used by BLEval object.

6. **Add SCTENIFOLDNET to runner.py:** Next, update runner.py script in Beeline/BLRun folder by adding information related to SCTENIFOLDNET.

- add "import BLRun.sctenifoldnetRunner as SCTENIFOLDNET"
- add "'SCTENIFOLDNET':SCTENIFOLDNET.generateInputs" to InputMapper
- add "'SCTENIFOLDNET':SCTENIFOLDNET.run" to AlgorithmMapper
- add "'SCTENIFOLDNET':SCTENIFOLDNET.parseOutput" to OutputParser

7. **Add SCTENIFOLDNET to config.yaml:** The final step is to add the new algorithm SCTENIFOLDNET and any necessary parameters to the config.yaml located in Beeline/config-files folder. Note that currently BEELINE can only handle one parameter set at a time eventhough multiple parameters can be passed onto the single parameter object.

- name: "SCTENIFOLDNET"
params:
should_run: [True]
40 changes: 40 additions & 0 deletions Algorithms/SCTENIFOLDNET/runSCTENIFOLDNET.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
library(scTenifoldNet)
library(reshape2)
args <- commandArgs(trailingOnly = T)
inFile <- args[1]
outFile <- args[2]

# input expression data
inputExpr <- read.table(inFile, sep=",", header = 1, row.names = 1)
geneNames <- rownames(inputExpr)
rownames(inputExpr) <- c(geneNames)
inputExpr <- as.matrix(inputExpr)
#nGenes <- nrow(inputExpr)

# Run pcNet
# Link to paper: https://doi.org/10.1101/2020.02.12.931469
set.seed(1)
num_genes <- nrow(inputExpr)
if (num_genes > 2) {
nComp <- num_genes - 1 # Setting nComp to one less than the total number of genes
} else {
stop("Not enough genes in the dataset")
}

pcNetResults= as.matrix(pcNet(X = inputExpr, nComp = nComp)) #nComp = 9))
#set.seed(1)
#pcNetResults = makeNetworks(inputExpr, nComp = round(nGenes/2), q = 0, nNet = 10)
#set.seed(1)
#pcNetResults = tensorDecomposition(pcNetResults)
#pcNetResults = as.matrix(pcNetResults$X)
diag(pcNetResults) <- 1

# Write output to a file
# https://stackoverflow.com/questions/38664241/ranking-and-counting-matrix-elements-in-r
DF = melt(pcNetResults)

#DF = data.frame(Gene1 = geneNames[c(row(pcorResults$estimate))], Gene2 = geneNames[c(col(pcorResults$estimate))]
# , corVal = c(pcorResults$estimate), pValue = c(pcorResults$p.value))
colnames(DF) = c('Gene1', 'Gene2', 'corVal')
outDF <- DF[order(DF$corVal, decreasing=TRUE), ]
write.table(outDF, outFile, sep = "\t", quote = FALSE, row.names = FALSE)
Loading