Murali-group · yiqisu · Aug 9, 2024 · Aug 9, 2024 · Aug 9, 2024 · Aug 9, 2024
diff --git a/Algorithms/SCORPION/.Rhistory b/Algorithms/SCORPION/.Rhistory
diff --git a/Algorithms/SCORPION/Dockerfile b/Algorithms/SCORPION/Dockerfile
@@ -0,0 +1,25 @@
+FROM r-base:4.2.0
+
+LABEL maintainer = "Daniel Osorio <[email protected]>"
+
+USER root
+
+WORKDIR /
+
+RUN R -e "install.packages('https://cran.r-project.org/src/contrib/remotes_2.4.2.tar.gz', type = 'source')"
+
+RUN R -e "install.packages('reshape2')"
+
+# RUN R -e "remotes::install_github('kuijjerlab/SCORPION')"
+
+RUN R -e "install.packages('SCORPION')"
+
+RUN R -e "library(reshape2)"
+
+RUN R -e "library(SCORPION)"
+
+COPY runSCORPION.R /
+
+RUN mkdir data/
+
+RUN apt-get update && apt-get install -y time
diff --git a/Algorithms/SCORPION/README.md b/Algorithms/SCORPION/README.md
@@ -0,0 +1,74 @@
+*This README.md file was generated on 2/4/2023 by Yiqi Su ([email protected])*
+
+**We would like to acknowledge professors Daniel Osorio, S. Stephen Yi and Marieke L. Kuijjer for sharing the code for SCORPION.**
+
+<!-- remove all comments (like this) before final save  -->
+
+# SCORPION: Single-Cell Oriented Reconstruction of PANDA (https://sites.google.com/a/channing.harvard.edu/kimberlyglass/tools/panda) Individually Optimized Gene Regulatory Network
+
+This is the instruction on how to integrate the new GRN method SCORPION ([[Paper](https://doi.org/10.1101/2023.01.20.524974)] [[GitHub](https://github.com/kuijjerlab/SCORPION)]) to BEELINE. 
+Please follow the following steps:
+
+1. **Create SCORPION folder:** Create a folder called SCORPION under Beeline/Algorithms for the new method to ensure easy set-up and portability and avoid conflicting libraries/software versions that may arise from the GRN algorithm implmentations.
+
+2. **Create runSCORPION.py script:** In the SCORPION folder, create an R script runSCORPION.r to learn graphs from target datasets. 
+
+3. **Create a Dockerfile:** Create a "Dockerfile" that contains necessary software specifications and commands listed in a specific order from top to bottom. 
+
+
+        FROM r-base:4.2.0
+
+        LABEL maintainer = "Daniel Osorio <[email protected]>"
+
+        USER root
+
+        WORKDIR /
+
+        RUN R -e "install.packages('https://cran.r-project.org/src/contrib/remotes_2.4.2.tar.gz', type = 'source')"
+
+        RUN R -e "install.packages('reshape2')"
+
+        RUN R -e "remotes::install_github('kuijjerlab/SCORPION')"
+
+        RUN R -e "library(SCORPION)"
+
+        RUN R -e "library(reshape2)"
+
+        COPY runSCORPION.R /
+
+        RUN mkdir data/
+
+        RUN apt-get update && apt-get install time
+
+The Dockerfile will run the script runSCORPION.py within the Docker container.
+
+4. **Add the Dockerfile to initialize.sh script:** Once the Dockerfile is ready, add the following lines to 'initialize.sh' script to create Docker image for scorpion.
+
+
+        cd $BASEDIR/Algorithms/SCORPION/
+        docker build -q -t scorpion:base .
+        if ([[ "$(docker images -q scorpion:base 2> /dev/null)" != "" ]]); then
+            echo "Docker container for SCORPION is built and tagged as scorpion:base"
+        else
+            echo "Oops! Unable to build Docker container for SCORPION"
+        fi
+
+5. **Create scorpionRunner.py script:** After buliding the Docker image, create a Python script called scorpionRunner.py in Beeline/BLRun folder to setup a BLRun object so that it is able to read inputs and run scorpion inside the Docker image, and also parse the output for evaluation. Specifically, the scorpionRunner.py script contains three functions:
+
+   - ``generateInputs()`` : This function reads the input data file (i.e., expression data), and processes it into the format required by SCORPION. 
+   - ``run()`` : This function constructs a "docker run" system command with parameters including the path of the input data file (i.e., expression data). It also specifies where the outputs are written. The docker container runs SCORPION when the parameters are passed. 
+   - ``parseOutput()`` : This function reads the SCORPION-specific output (i.e., outFile.txt) and formats it into a ranked edgelist comma-separated file (i.e., rankedEdges.csv) with columns Gene1, Gene2, and EdgeWeight. The Gene1 column should contain regulators, the Gene2 column the targets, and EdgeWeight column the absolute value of the weight predicted for edge (regulator,target). The ranked edgelist file will be subsequently used by BLEval object. 
+
+6. **Add SCORPION to runner.py:** Next, update runner.py script in Beeline/BLRun folder by adding information related to SCORPION. 
+
+    - add "import BLRun.scorpionRunner as SCORPION"
+    - add "'SCORPION':SCORPION.generateInputs" to InputMapper
+    - add "'SCORPION':SCORPION.run" to AlgorithmMapper
+    - add "'SCORPION':SCORPION.parseOutput" to OutputParser
+
+7. **Add SCORPION to config.yaml:** The final step is to add the new algorithm SCORPION and any necessary parameters to the config.yaml located in Beeline/config-files folder. Note that currently BEELINE can only handle one parameter set at a time eventhough multiple parameters can be passed onto the single parameter object.
+
+
+        - name: "SCORPION"
+          params:
+              should_run: [True]
diff --git a/Algorithms/SCORPION/runSCORPION.R b/Algorithms/SCORPION/runSCORPION.R
@@ -0,0 +1,52 @@
+library(SCORPION)
+library(reshape2)
+args <- commandArgs(trailingOnly = T)
+inFile <- args[1]
+outFile <-  args[2]
+
+# input expression data
+inputExpr <- read.table(inFile, sep=",", header = 1, row.names = 1)
+geneNames <- rownames(inputExpr)
+rownames(inputExpr) <- c(geneNames)
+inputExpr <- as.matrix(inputExpr)
+#nGenes <- nrow(inputExpr)
+
+# Run SCORPION
+
+nGenes <- nrow(inputExpr)
+n.pc <- min(5, nGenes - 1)  # Ensure n.pc is less than number of genes
+
+# X <- SCORPION:::makeSuperCells(inputExpr, n.pc = n.pc)
+# Example adjustment for choosing between irlba and svd
+# if (n.pc / nGenes > 0.1) {
+#     # Use SVD
+#     svd_results <- svd(inputExpr)
+#     X <- svd_results$u[, 1:n.pc] %*% diag(svd_results$d[1:n.pc])
+# } else {
+#     # Use SCORPION makeSuperCells
+#     X <- SCORPION:::makeSuperCells(inputExpr, n.pc = n.pc)
+# }
+if (n.pc / nGenes > 0.1) {
+    # Use SVD
+    svd_results <- svd(inputExpr)
+    X <- svd_results$u[, 1:n.pc] %*% diag(svd_results$d[1:n.pc])
+    # Retain the original row names
+    rownames(X) <- rownames(inputExpr)
+} else {
+    # Use SCORPION makeSuperCells
+    X <- SCORPION:::makeSuperCells(inputExpr, n.pc = n.pc)
+    # Assuming makeSuperCells retains row names, if not, add them similarly
+    rownames(X) <- rownames(inputExpr)
+}
+
+
+X <- cor(t(as.matrix(X)), method = 'sp')
+# Write output to a file
+# https://stackoverflow.com/questions/38664241/ranking-and-counting-matrix-elements-in-r
+DF = melt(X)
+
+#DF = data.frame(Gene1 = geneNames[c(row(pcorResults$estimate))], Gene2 = geneNames[c(col(pcorResults$estimate))]
+#                , corVal = c(pcorResults$estimate), pValue =  c(pcorResults$p.value))
+colnames(DF) = c('Gene1', 'Gene2', 'corVal')
+outDF <- DF[order(DF$corVal, decreasing=TRUE), ]
+write.table(outDF, outFile, sep = "\t", quote = FALSE, row.names = FALSE)
diff --git a/Algorithms/SCORPION/scorpionTest.RData b/Algorithms/SCORPION/scorpionTest.RData
diff --git a/Algorithms/SCTENIFOLDNET/Dockerfile b/Algorithms/SCTENIFOLDNET/Dockerfile
@@ -0,0 +1,32 @@
+FROM r-base:4.2.0
+
+LABEL maintainer = "Daniel Osorio <[email protected]>"
+
+USER root
+
+WORKDIR /
+
+# RUN R -e "install.packages('https://cran.r-project.org/src/contrib/remotes_2.4.2.tar.gz', type = 'source')"
+
+RUN R -e "install.packages('remotes')" 
+
+RUN R -e "library(remotes)"
+
+RUN R -e "remotes::install_cran(pkgs = 'scTenifoldNet', quiet = TRUE)"
+
+RUN R -e "remotes::install_cran(pkgs = 'reshape2', quiet = TRUE)"
+
+# RUN R -e "install.packages('scTenifoldNet')" 
+
+# RUN R -e "install.packages('reshape2')"
+
+RUN R -e "library(scTenifoldNet)"
+
+RUN R -e "library(reshape2)"
+
+COPY runSCTENIFOLDNET.R /
+
+RUN mkdir data/
+
+RUN apt-get update && apt-get install -y time
+
diff --git a/Algorithms/SCTENIFOLDNET/README.md b/Algorithms/SCTENIFOLDNET/README.md
@@ -0,0 +1,72 @@
+*This README.md file was generated on 2/20/2023 by Yiqi Su ([email protected])*
+
+**We would like to acknowledge professor Daniel Osorio for sharing the code for SCTENIFOLDNET.**
+
+<!-- remove all comments (like this) before final save  -->
+
+# scTenifoldNet: A Machine Learning Workflow for Constructing and Comparing Transcriptome-wide Gene Regulatory Networks from Single-Cell Data
+
+This is the instruction on how to integrate the new GRN method SCTENIFOLDNET ([[Paper](https://doi.org/10.1016/j.patter.2020.100139)] [[GitHub](https://github.com/jamesjcai/ScTenifoldNet.jl)]) to BEELINE. 
+Please follow the following steps:
+
+1. **Create SCTENIFOLDNET folder:** Create a folder called SCTENIFOLDNET under Beeline/Algorithms for the new method to ensure easy set-up and portability and avoid conflicting libraries/software versions that may arise from the GRN algorithm implmentations.
+
+2. **Create runSCTENIFOLDNET.py script:** In the SCTENIFOLDNET folder, create an R script runSCTENIFOLDNET.r to learn graphs from target datasets. 
+
+3. **Create a Dockerfile:** Create a "Dockerfile" that contains necessary software specifications and commands listed in a specific order from top to bottom. 
+
+        FROM r-base:4.0.2
+
+        LABEL maintainer = "Daniel Osorio <[email protected]>"
+
+        USER root
+
+        WORKDIR /
+
+        RUN R -e "install.packages('https://cran.r-project.org/src/contrib/remotes_2.4.2.tar.gz', type = 'source')"
+
+        RUN R -e "remotes::install_cran(pkgs = 'scTenifoldNet', quiet = TRUE)"
+
+        RUN R -e "remotes::install_cran(pkgs = 'reshape2', quiet = TRUE)"
+
+        RUN R -e "library(scTenifoldNet)"
+
+        RUN R -e "library(reshape2)"
+
+        COPY runSCTENIFOLDNET.R /
+
+        RUN mkdir data/
+
+        RUN apt-get update && apt-get install -y time
+
+
+The Dockerfile will run the script runSCTENIFOLDNET.py within the Docker container.
+
+4. **Add the Dockerfile to initialize.sh script:** Once the Dockerfile is ready, add the following lines to 'initialize.sh' script to create Docker image for sctenifoldnet.
+
+        cd $BASEDIR/Algorithms/SCTENIFOLDNET/
+        docker build -q -t sctenifoldnet:base .
+        if ([[ "$(docker images -q sctenifoldnet:base 2> /dev/null)" != "" ]]); then
+            echo "Docker container for SCTENIFOLDNET is built and tagged as sctenifoldnet:base"
+        else
+            echo "Oops! Unable to build Docker container for SCTENIFOLDNET"
+        fi
+
+5. **Create sctenifoldnetRunner.py script:** After buliding the Docker image, create a Python script called sctenifoldnetRunner.py in Beeline/BLRun folder to setup a BLRun object so that it is able to read inputs and run sctenifoldnet inside the Docker image, and also parse the output for evaluation. Specifically, the sctenifoldnetRunner.py script contains three functions:
+
+   - ``generateInputs()`` : This function reads the input data file (i.e., expression data), and processes it into the format required by SCTENIFOLDNET. 
+   - ``run()`` : This function constructs a "docker run" system command with parameters including the path of the input data file (i.e., expression data). It also specifies where the outputs are written. The docker container runs SCTENIFOLDNET when the parameters are passed. 
+   - ``parseOutput()`` : This function reads the SCTENIFOLDNET-specific output (i.e., outFile.txt) and formats it into a ranked edgelist comma-separated file (i.e., rankedEdges.csv) with columns Gene1, Gene2, and EdgeWeight. The Gene1 column should contain regulators, the Gene2 column the targets, and EdgeWeight column the absolute value of the weight predicted for edge (regulator,target). The ranked edgelist file will be subsequently used by BLEval object. 
+
+6. **Add SCTENIFOLDNET to runner.py:** Next, update runner.py script in Beeline/BLRun folder by adding information related to SCTENIFOLDNET. 
+
+    - add "import BLRun.sctenifoldnetRunner as SCTENIFOLDNET"
+    - add "'SCTENIFOLDNET':SCTENIFOLDNET.generateInputs" to InputMapper
+    - add "'SCTENIFOLDNET':SCTENIFOLDNET.run" to AlgorithmMapper
+    - add "'SCTENIFOLDNET':SCTENIFOLDNET.parseOutput" to OutputParser
+
+7. **Add SCTENIFOLDNET to config.yaml:** The final step is to add the new algorithm SCTENIFOLDNET and any necessary parameters to the config.yaml located in Beeline/config-files folder. Note that currently BEELINE can only handle one parameter set at a time eventhough multiple parameters can be passed onto the single parameter object.
+
+        - name: "SCTENIFOLDNET"
+          params:
+              should_run: [True]
diff --git a/Algorithms/SCTENIFOLDNET/runSCTENIFOLDNET.R b/Algorithms/SCTENIFOLDNET/runSCTENIFOLDNET.R
@@ -0,0 +1,40 @@
+library(scTenifoldNet)
+library(reshape2)
+args <- commandArgs(trailingOnly = T)
+inFile <- args[1]
+outFile <-  args[2]
+
+# input expression data
+inputExpr <- read.table(inFile, sep=",", header = 1, row.names = 1)
+geneNames <- rownames(inputExpr)
+rownames(inputExpr) <- c(geneNames)
+inputExpr <- as.matrix(inputExpr)
+#nGenes <- nrow(inputExpr)
+
+# Run pcNet 
+# Link to paper: https://doi.org/10.1101/2020.02.12.931469
+set.seed(1)
+num_genes <- nrow(inputExpr)
+if (num_genes > 2) {
+    nComp <- num_genes - 1  # Setting nComp to one less than the total number of genes
+} else {
+    stop("Not enough genes in the dataset")
+}
+
+pcNetResults= as.matrix(pcNet(X = inputExpr, nComp = nComp))  #nComp = 9))
+#set.seed(1)
+#pcNetResults = makeNetworks(inputExpr, nComp = round(nGenes/2), q = 0, nNet = 10)
+#set.seed(1)
+#pcNetResults = tensorDecomposition(pcNetResults)
+#pcNetResults = as.matrix(pcNetResults$X)
+diag(pcNetResults) <- 1
+
+# Write output to a file
+# https://stackoverflow.com/questions/38664241/ranking-and-counting-matrix-elements-in-r
+DF = melt(pcNetResults)
+
+#DF = data.frame(Gene1 = geneNames[c(row(pcorResults$estimate))], Gene2 = geneNames[c(col(pcorResults$estimate))]
+#                , corVal = c(pcorResults$estimate), pValue =  c(pcorResults$p.value))
+colnames(DF) = c('Gene1', 'Gene2', 'corVal')
+outDF <- DF[order(DF$corVal, decreasing=TRUE), ]
+write.table(outDF, outFile, sep = "\t", quote = FALSE, row.names = FALSE)