diff --git a/Algorithms/SCORPION/.Rhistory b/Algorithms/SCORPION/.Rhistory new file mode 100644 index 00000000..e69de29b diff --git a/Algorithms/SCORPION/Dockerfile b/Algorithms/SCORPION/Dockerfile new file mode 100644 index 00000000..dcd41156 --- /dev/null +++ b/Algorithms/SCORPION/Dockerfile @@ -0,0 +1,25 @@ +FROM r-base:4.2.0 + +LABEL maintainer = "Daniel Osorio " + +USER root + +WORKDIR / + +RUN R -e "install.packages('https://cran.r-project.org/src/contrib/remotes_2.4.2.tar.gz', type = 'source')" + +RUN R -e "install.packages('reshape2')" + +# RUN R -e "remotes::install_github('kuijjerlab/SCORPION')" + +RUN R -e "install.packages('SCORPION')" + +RUN R -e "library(reshape2)" + +RUN R -e "library(SCORPION)" + +COPY runSCORPION.R / + +RUN mkdir data/ + +RUN apt-get update && apt-get install -y time diff --git a/Algorithms/SCORPION/README.md b/Algorithms/SCORPION/README.md new file mode 100644 index 00000000..ed7c7f69 --- /dev/null +++ b/Algorithms/SCORPION/README.md @@ -0,0 +1,74 @@ +*This README.md file was generated on 2/4/2023 by Yiqi Su (yiqisu@vt.edu)* + +**We would like to acknowledge professors Daniel Osorio, S. Stephen Yi and Marieke L. Kuijjer for sharing the code for SCORPION.** + + + +# SCORPION: Single-Cell Oriented Reconstruction of PANDA (https://sites.google.com/a/channing.harvard.edu/kimberlyglass/tools/panda) Individually Optimized Gene Regulatory Network + +This is the instruction on how to integrate the new GRN method SCORPION ([[Paper](https://doi.org/10.1101/2023.01.20.524974)] [[GitHub](https://github.com/kuijjerlab/SCORPION)]) to BEELINE. +Please follow the following steps: + +1. **Create SCORPION folder:** Create a folder called SCORPION under Beeline/Algorithms for the new method to ensure easy set-up and portability and avoid conflicting libraries/software versions that may arise from the GRN algorithm implmentations. + +2. **Create runSCORPION.py script:** In the SCORPION folder, create an R script runSCORPION.r to learn graphs from target datasets. + +3. **Create a Dockerfile:** Create a "Dockerfile" that contains necessary software specifications and commands listed in a specific order from top to bottom. + + + FROM r-base:4.2.0 + + LABEL maintainer = "Daniel Osorio " + + USER root + + WORKDIR / + + RUN R -e "install.packages('https://cran.r-project.org/src/contrib/remotes_2.4.2.tar.gz', type = 'source')" + + RUN R -e "install.packages('reshape2')" + + RUN R -e "remotes::install_github('kuijjerlab/SCORPION')" + + RUN R -e "library(SCORPION)" + + RUN R -e "library(reshape2)" + + COPY runSCORPION.R / + + RUN mkdir data/ + + RUN apt-get update && apt-get install time + +The Dockerfile will run the script runSCORPION.py within the Docker container. + +4. **Add the Dockerfile to initialize.sh script:** Once the Dockerfile is ready, add the following lines to 'initialize.sh' script to create Docker image for scorpion. + + + cd $BASEDIR/Algorithms/SCORPION/ + docker build -q -t scorpion:base . + if ([[ "$(docker images -q scorpion:base 2> /dev/null)" != "" ]]); then + echo "Docker container for SCORPION is built and tagged as scorpion:base" + else + echo "Oops! Unable to build Docker container for SCORPION" + fi + +5. **Create scorpionRunner.py script:** After buliding the Docker image, create a Python script called scorpionRunner.py in Beeline/BLRun folder to setup a BLRun object so that it is able to read inputs and run scorpion inside the Docker image, and also parse the output for evaluation. Specifically, the scorpionRunner.py script contains three functions: + + - ``generateInputs()`` : This function reads the input data file (i.e., expression data), and processes it into the format required by SCORPION. + - ``run()`` : This function constructs a "docker run" system command with parameters including the path of the input data file (i.e., expression data). It also specifies where the outputs are written. The docker container runs SCORPION when the parameters are passed. + - ``parseOutput()`` : This function reads the SCORPION-specific output (i.e., outFile.txt) and formats it into a ranked edgelist comma-separated file (i.e., rankedEdges.csv) with columns Gene1, Gene2, and EdgeWeight. The Gene1 column should contain regulators, the Gene2 column the targets, and EdgeWeight column the absolute value of the weight predicted for edge (regulator,target). The ranked edgelist file will be subsequently used by BLEval object. + +6. **Add SCORPION to runner.py:** Next, update runner.py script in Beeline/BLRun folder by adding information related to SCORPION. + + - add "import BLRun.scorpionRunner as SCORPION" + - add "'SCORPION':SCORPION.generateInputs" to InputMapper + - add "'SCORPION':SCORPION.run" to AlgorithmMapper + - add "'SCORPION':SCORPION.parseOutput" to OutputParser + +7. **Add SCORPION to config.yaml:** The final step is to add the new algorithm SCORPION and any necessary parameters to the config.yaml located in Beeline/config-files folder. Note that currently BEELINE can only handle one parameter set at a time eventhough multiple parameters can be passed onto the single parameter object. + + + - name: "SCORPION" + params: + should_run: [True] diff --git a/Algorithms/SCORPION/runSCORPION.R b/Algorithms/SCORPION/runSCORPION.R new file mode 100644 index 00000000..b3295d51 --- /dev/null +++ b/Algorithms/SCORPION/runSCORPION.R @@ -0,0 +1,52 @@ +library(SCORPION) +library(reshape2) +args <- commandArgs(trailingOnly = T) +inFile <- args[1] +outFile <- args[2] + +# input expression data +inputExpr <- read.table(inFile, sep=",", header = 1, row.names = 1) +geneNames <- rownames(inputExpr) +rownames(inputExpr) <- c(geneNames) +inputExpr <- as.matrix(inputExpr) +#nGenes <- nrow(inputExpr) + +# Run SCORPION + +nGenes <- nrow(inputExpr) +n.pc <- min(5, nGenes - 1) # Ensure n.pc is less than number of genes + +# X <- SCORPION:::makeSuperCells(inputExpr, n.pc = n.pc) +# Example adjustment for choosing between irlba and svd +# if (n.pc / nGenes > 0.1) { +# # Use SVD +# svd_results <- svd(inputExpr) +# X <- svd_results$u[, 1:n.pc] %*% diag(svd_results$d[1:n.pc]) +# } else { +# # Use SCORPION makeSuperCells +# X <- SCORPION:::makeSuperCells(inputExpr, n.pc = n.pc) +# } +if (n.pc / nGenes > 0.1) { + # Use SVD + svd_results <- svd(inputExpr) + X <- svd_results$u[, 1:n.pc] %*% diag(svd_results$d[1:n.pc]) + # Retain the original row names + rownames(X) <- rownames(inputExpr) +} else { + # Use SCORPION makeSuperCells + X <- SCORPION:::makeSuperCells(inputExpr, n.pc = n.pc) + # Assuming makeSuperCells retains row names, if not, add them similarly + rownames(X) <- rownames(inputExpr) +} + + +X <- cor(t(as.matrix(X)), method = 'sp') +# Write output to a file +# https://stackoverflow.com/questions/38664241/ranking-and-counting-matrix-elements-in-r +DF = melt(X) + +#DF = data.frame(Gene1 = geneNames[c(row(pcorResults$estimate))], Gene2 = geneNames[c(col(pcorResults$estimate))] +# , corVal = c(pcorResults$estimate), pValue = c(pcorResults$p.value)) +colnames(DF) = c('Gene1', 'Gene2', 'corVal') +outDF <- DF[order(DF$corVal, decreasing=TRUE), ] +write.table(outDF, outFile, sep = "\t", quote = FALSE, row.names = FALSE) diff --git a/Algorithms/SCORPION/scorpionTest.RData b/Algorithms/SCORPION/scorpionTest.RData new file mode 100644 index 00000000..a790de6c Binary files /dev/null and b/Algorithms/SCORPION/scorpionTest.RData differ diff --git a/Algorithms/SCTENIFOLDNET/Dockerfile b/Algorithms/SCTENIFOLDNET/Dockerfile new file mode 100644 index 00000000..b2a786cc --- /dev/null +++ b/Algorithms/SCTENIFOLDNET/Dockerfile @@ -0,0 +1,32 @@ +FROM r-base:4.2.0 + +LABEL maintainer = "Daniel Osorio " + +USER root + +WORKDIR / + +# RUN R -e "install.packages('https://cran.r-project.org/src/contrib/remotes_2.4.2.tar.gz', type = 'source')" + +RUN R -e "install.packages('remotes')" + +RUN R -e "library(remotes)" + +RUN R -e "remotes::install_cran(pkgs = 'scTenifoldNet', quiet = TRUE)" + +RUN R -e "remotes::install_cran(pkgs = 'reshape2', quiet = TRUE)" + +# RUN R -e "install.packages('scTenifoldNet')" + +# RUN R -e "install.packages('reshape2')" + +RUN R -e "library(scTenifoldNet)" + +RUN R -e "library(reshape2)" + +COPY runSCTENIFOLDNET.R / + +RUN mkdir data/ + +RUN apt-get update && apt-get install -y time + diff --git a/Algorithms/SCTENIFOLDNET/README.md b/Algorithms/SCTENIFOLDNET/README.md new file mode 100644 index 00000000..5760ba22 --- /dev/null +++ b/Algorithms/SCTENIFOLDNET/README.md @@ -0,0 +1,72 @@ +*This README.md file was generated on 2/20/2023 by Yiqi Su (yiqisu@vt.edu)* + +**We would like to acknowledge professor Daniel Osorio for sharing the code for SCTENIFOLDNET.** + + + +# scTenifoldNet: A Machine Learning Workflow for Constructing and Comparing Transcriptome-wide Gene Regulatory Networks from Single-Cell Data + +This is the instruction on how to integrate the new GRN method SCTENIFOLDNET ([[Paper](https://doi.org/10.1016/j.patter.2020.100139)] [[GitHub](https://github.com/jamesjcai/ScTenifoldNet.jl)]) to BEELINE. +Please follow the following steps: + +1. **Create SCTENIFOLDNET folder:** Create a folder called SCTENIFOLDNET under Beeline/Algorithms for the new method to ensure easy set-up and portability and avoid conflicting libraries/software versions that may arise from the GRN algorithm implmentations. + +2. **Create runSCTENIFOLDNET.py script:** In the SCTENIFOLDNET folder, create an R script runSCTENIFOLDNET.r to learn graphs from target datasets. + +3. **Create a Dockerfile:** Create a "Dockerfile" that contains necessary software specifications and commands listed in a specific order from top to bottom. + + FROM r-base:4.0.2 + + LABEL maintainer = "Daniel Osorio " + + USER root + + WORKDIR / + + RUN R -e "install.packages('https://cran.r-project.org/src/contrib/remotes_2.4.2.tar.gz', type = 'source')" + + RUN R -e "remotes::install_cran(pkgs = 'scTenifoldNet', quiet = TRUE)" + + RUN R -e "remotes::install_cran(pkgs = 'reshape2', quiet = TRUE)" + + RUN R -e "library(scTenifoldNet)" + + RUN R -e "library(reshape2)" + + COPY runSCTENIFOLDNET.R / + + RUN mkdir data/ + + RUN apt-get update && apt-get install -y time + + +The Dockerfile will run the script runSCTENIFOLDNET.py within the Docker container. + +4. **Add the Dockerfile to initialize.sh script:** Once the Dockerfile is ready, add the following lines to 'initialize.sh' script to create Docker image for sctenifoldnet. + + cd $BASEDIR/Algorithms/SCTENIFOLDNET/ + docker build -q -t sctenifoldnet:base . + if ([[ "$(docker images -q sctenifoldnet:base 2> /dev/null)" != "" ]]); then + echo "Docker container for SCTENIFOLDNET is built and tagged as sctenifoldnet:base" + else + echo "Oops! Unable to build Docker container for SCTENIFOLDNET" + fi + +5. **Create sctenifoldnetRunner.py script:** After buliding the Docker image, create a Python script called sctenifoldnetRunner.py in Beeline/BLRun folder to setup a BLRun object so that it is able to read inputs and run sctenifoldnet inside the Docker image, and also parse the output for evaluation. Specifically, the sctenifoldnetRunner.py script contains three functions: + + - ``generateInputs()`` : This function reads the input data file (i.e., expression data), and processes it into the format required by SCTENIFOLDNET. + - ``run()`` : This function constructs a "docker run" system command with parameters including the path of the input data file (i.e., expression data). It also specifies where the outputs are written. The docker container runs SCTENIFOLDNET when the parameters are passed. + - ``parseOutput()`` : This function reads the SCTENIFOLDNET-specific output (i.e., outFile.txt) and formats it into a ranked edgelist comma-separated file (i.e., rankedEdges.csv) with columns Gene1, Gene2, and EdgeWeight. The Gene1 column should contain regulators, the Gene2 column the targets, and EdgeWeight column the absolute value of the weight predicted for edge (regulator,target). The ranked edgelist file will be subsequently used by BLEval object. + +6. **Add SCTENIFOLDNET to runner.py:** Next, update runner.py script in Beeline/BLRun folder by adding information related to SCTENIFOLDNET. + + - add "import BLRun.sctenifoldnetRunner as SCTENIFOLDNET" + - add "'SCTENIFOLDNET':SCTENIFOLDNET.generateInputs" to InputMapper + - add "'SCTENIFOLDNET':SCTENIFOLDNET.run" to AlgorithmMapper + - add "'SCTENIFOLDNET':SCTENIFOLDNET.parseOutput" to OutputParser + +7. **Add SCTENIFOLDNET to config.yaml:** The final step is to add the new algorithm SCTENIFOLDNET and any necessary parameters to the config.yaml located in Beeline/config-files folder. Note that currently BEELINE can only handle one parameter set at a time eventhough multiple parameters can be passed onto the single parameter object. + + - name: "SCTENIFOLDNET" + params: + should_run: [True] diff --git a/Algorithms/SCTENIFOLDNET/runSCTENIFOLDNET.R b/Algorithms/SCTENIFOLDNET/runSCTENIFOLDNET.R new file mode 100644 index 00000000..f3d52a84 --- /dev/null +++ b/Algorithms/SCTENIFOLDNET/runSCTENIFOLDNET.R @@ -0,0 +1,40 @@ +library(scTenifoldNet) +library(reshape2) +args <- commandArgs(trailingOnly = T) +inFile <- args[1] +outFile <- args[2] + +# input expression data +inputExpr <- read.table(inFile, sep=",", header = 1, row.names = 1) +geneNames <- rownames(inputExpr) +rownames(inputExpr) <- c(geneNames) +inputExpr <- as.matrix(inputExpr) +#nGenes <- nrow(inputExpr) + +# Run pcNet +# Link to paper: https://doi.org/10.1101/2020.02.12.931469 +set.seed(1) +num_genes <- nrow(inputExpr) +if (num_genes > 2) { + nComp <- num_genes - 1 # Setting nComp to one less than the total number of genes +} else { + stop("Not enough genes in the dataset") +} + +pcNetResults= as.matrix(pcNet(X = inputExpr, nComp = nComp)) #nComp = 9)) +#set.seed(1) +#pcNetResults = makeNetworks(inputExpr, nComp = round(nGenes/2), q = 0, nNet = 10) +#set.seed(1) +#pcNetResults = tensorDecomposition(pcNetResults) +#pcNetResults = as.matrix(pcNetResults$X) +diag(pcNetResults) <- 1 + +# Write output to a file +# https://stackoverflow.com/questions/38664241/ranking-and-counting-matrix-elements-in-r +DF = melt(pcNetResults) + +#DF = data.frame(Gene1 = geneNames[c(row(pcorResults$estimate))], Gene2 = geneNames[c(col(pcorResults$estimate))] +# , corVal = c(pcorResults$estimate), pValue = c(pcorResults$p.value)) +colnames(DF) = c('Gene1', 'Gene2', 'corVal') +outDF <- DF[order(DF$corVal, decreasing=TRUE), ] +write.table(outDF, outFile, sep = "\t", quote = FALSE, row.names = FALSE) diff --git a/BLData/__init__.py b/BLData/__init__.py new file mode 100644 index 00000000..67a94a8e --- /dev/null +++ b/BLData/__init__.py @@ -0,0 +1,254 @@ +""" +Sfaira Data Loader (:mod:`SfairaData`) module contains the following main class: + + +- :class:`SfairaData.SfairaData` and two additional classes used in the definition of SfairaData class +- :class:`SfairaData.SfairaSettings` +- :class:`SfairaData.ConfigParser` + + +""" + + +import yaml +import argparse +import itertools +from collections import defaultdict +from glob import glob +import pathlib +from pathlib import Path +import concurrent.futures +from typing import Dict, List +import multiprocessing +from multiprocessing import Pool, cpu_count +import concurrent.futures +import os +import pandas as pd +import sfaira +# import importlib_metadata +# import h5py +import scanpy as sc +import anndata +from andata import read_h5ad +import zipfile +import gzip +import tarfile +import urllib.request +import GEOparse + + + + +class SfairaSettings(object): + ''' + The class for storing the names of directories that datasets should + be downloaded to and the features to filter subsets. + This initilizes an SfairaSettings object based on the following parameters. + + :param base_dir: sfaira root directory, typically 'inputs/sfaira' + :type base_dir: str + :param filterss: List of key-value pairs to filter subsets + :type filters: list + ''' + def __init__(self, base_dir, subsets) -> None: + self.base_dir = base_dir + self.subsets = subsets + + + + +class SfairaData(object): + ''' + The SfairaData object is created by parsing a user-provided configuration + file. Its methods provide for further processing its datasets into + a series of jobs to be run, as well as running these jobs. + ''' + def __init__(self, + sfaira_settings: SfairaSettings) -> None: + self.sfaira_settings = sfaira_settings + + + + + def sfairaLoader(self): + ''' + Download specified datasets from sfaira data repository. + + :returns: + Folders containing scRNA-seq datasets. + ''' + # Filter subset by looping all the key-value pairs + for i in range(len(self.sfaira_settings)): + basedir = self.sfaira_settings[i].base_dir + datadir = os.path.join(basedir, 'raw') + metadir = os.path.join(basedir, 'meta') + cachedir = os.path.join(basedir, 'cache') + ds = sfaira.data.Universe(data_path=datadir, meta_path=metadir, cache_path=cachedir) + filters = self.sfaira_settings[i].subsets[i] + + + ############# subset ############ + for key_val in filters: + ds.subset(key=key_val, values=filters[key_val][:]) + + + ############# filtering ############ + # Download subsets to sepcifed folder and save in h5ad + # Locate the cache folder + cache_dir = os.path.join(dataset.path, "cache") + + + # Move or copy the cache folder to the desired location + new_cache_dir = "/path/to/your/desired/location/cache" + shutil.copytree(cache_dir, new_cache_dir) + + ############# download ############ + ds.download() + ds.load() + # Load the h5ad file from the cache folder + # h5ad_path = os.path.join(new_cache_dir, f"{dataset.id}.h5ad") + # adata = read_h5ad(h5ad_path) + + + + def csvConverter(self): + ''' + Loop all files in multi-level subfolders + ''' + for i in range(len(self.sfaira_settings)): + subset_dir = self.sfaira_settings[i].base_dir + SfairaData.__folderProcess(subset_dir) + + + def __folderProcess(path): + ''' + Convert files to csv files. + ''' + if os.path.isdir(path): + # The file is a folder, so loop through its contents + for file in os.listdir(path): + # Construct the full path to the item + file_path = os.path.join(path, file) + + if os.path.isdir(file_path): + # The file is a folder, so recurse into it + SfairaData.__folderProcess(file_path) + else: + # The file is not a folder, convert the file to csv file + # compressed files + if file.endswith('.zip'): + zipfile.ZipFile(file_path, "r").extractall(os.path.dirname(os.path.abspath(file_path))) + elif file.endswith('.gz'): + with gzip.open(file_path, 'rb') as f_in: + with open(file_path.replace("gz", "csv"), 'wb') as f_out: + f_out.write(f_in.read()) + elif file.endswith('.tar'): + tarfile.open(file_path, "r").extractall(os.path.dirname(os.path.abspath(file_path))) + + + # h5ad files + if file.endswith('.h5ad'): + # Load h5ad file + adata = anndata.read_h5ad(file_path) + # Convert AnnData object to pandas DataFrame as gene x cell + df = pd.DataFrame(adata.X.todense(), index=adata.obs_names, columns=adata.var_names).T + # Write DataFrame to CSV file + file = file_path.replace("h5ad", "csv") + df.to_csv(file) + + # # acc.gci files + # if file.startswith('acc.cgi'): + # # Get the GEO accession number + # accession_number = file.split('=')[1] + # + # # Get data from python package GEOparse + # # Download the metadata and expression data for the GEO accession number + # gse = GEOparse.get_GEO(accession_number) + # # Access the expression data for the dataset + # expression_data = gse.table + # # Save the expression data to a CSV file + # file = accession_number + "csv" + # expression_data.to_csv(file) + # + # # Get data from URL + # # Construct the URL for the GEO accession number + # url = f'https://www.ncbi.nlm.nih.gov/geo/download/?acc={accession_number}&format=file' + # # Set the file for the downloaded file + # file = f'{accession_number}.RAW.tar' + # # Download the file using the urllib module + # urllib.request.urlretrieve(url, file) + +class ConfigParser(object): + ''' + The class define static methods for parsing and storing the contents + of the config file. + ''' + @staticmethod + def parse(config_file_handle) -> SfairaData: + ''' + A method for parsing the config .yaml file. + + :param config_file_handle: Name of the .yaml file to be parsed + :type config_file_handle: str + + :returns: + An object of class :class:`SfairaData.SfairaData`. + + + ''' + config_map = yaml.load(config_file_handle, Loader=yaml.Loader) + return SfairaData( + ConfigParser.__parse_sfaira_settings( + config_map['sfaira_settings'])) + + + + @staticmethod + def __parse_sfaira_settings(sfaira_settings_map) -> SfairaSettings: + ''' + A method for parsing and initializing sfaira data object. + ''' + # Obtain the data directory + sfaira_dir = sfaira_settings_map['data_dir'] + + + # Obtain the subdata directory + SfairaSettingsDir = {} + key_order = ["year", "organism", "organ", "assay_sc"] + order = 0 + + + for x in ConfigParser.__parse_sfaira_features(sfaira_settings_map['subsets']): + subset_dir = "" + for key in key_order: + if key in x: + # Replace sapce in string with hypen + val = str(x[key]).replace(", ", "+").replace(",", "+").replace(" ", "-").replace("[", "").replace("]", "").replace("'", "") + if len(subset_dir) == 0: + subset_dir = subset_dir + val + else: + subset_dir = subset_dir + "_" + val + # Set SfairaSettings for each specification of subset in config yaml + # print(subset_dir) + SfairaSettingsDir[order] = SfairaSettings(Path(sfaira_dir, subset_dir), + ConfigParser.__parse_sfaira_features(sfaira_settings_map['subsets'])) + order = order + 1 + return SfairaSettingsDir + + + + @staticmethod + def __parse_sfaira_features(sfaira_list): + ''' + A method for parsing parameters that determine the subsets to be downloaded. + ''' + # Initilalize the list of subset values + subsets = [] + # print(sfaira_list) + + + # Parse contents of sfaira_list + for x in sfaira_list: + key_values = x['filters'] + subsets.append(key_values) + return subsets diff --git a/BLDataloader.py b/BLDataloader.py new file mode 100644 index 00000000..c8a3c592 --- /dev/null +++ b/BLDataloader.py @@ -0,0 +1,53 @@ +#!/usr/bin/env python +# coding: utf-8 + +# Please refer to https://github.com/theislab/sfaira_tutorials/blob/master/tutorials/data_loaders.ipynb + +import argparse +import tensorflow as tf + +# local imports +import BLData as dt + +def get_parser() -> argparse.ArgumentParser: + ''' + :return: an argparse ArgumentParser object for parsing command + line parameters + ''' + parser = argparse.ArgumentParser(description='Download scRNA-seq datasets from Sfaira.') + + # Specify configure file + parser.add_argument('--config', default='config.yaml', + help="Configuration file containing list of input setting specifications.\n") + + return parser + + +def parse_arguments(): + ''' + Initialize a parser and use it to parse the command line arguments + :return: parsed dictionary of command line arguments + ''' + parser = get_parser() + opts = parser.parse_args() + + return opts + + +def main(): + opts = parse_arguments() + config_file = opts.config + + with open(config_file, 'r') as conf: + sfairaLoader = dt.ConfigParser.parse(conf) + print(sfairaLoader) + + dataSummarizer = dt.SfairaData(sfairaLoader.sfaira_settings) + + print("## Dataset downloads started") + dataSummarizer.sfairaLoader() + print('##Dataset downloads complete') + + +if __name__ == '__main__': + main() diff --git a/BLEval/computeEarlyPrec.py b/BLEval/computeEarlyPrec.py index fc82d1d6..d2a4958d 100644 --- a/BLEval/computeEarlyPrec.py +++ b/BLEval/computeEarlyPrec.py @@ -25,6 +25,11 @@ def EarlyPrec(evalObject, algorithmName, TFEdges = False): :param algorithmName: Name of the algorithm for which the early precision is computed. :type algorithmName: str + + :param TFEdges: Whether to include self-edges (TFEdges = False) or + include only TF-gene edges (TFEdges = True) for evaluation. + :type algorithmName: boolean + # Set TFEdges parameter to True for experimental scRNA-seq data evaluation and False for simulated datasets :returns: @@ -82,7 +87,7 @@ def EarlyPrec(evalObject, algorithmName, TFEdges = False): trueEdges = trueEdgesDF['Gene1'] + "|" + trueEdgesDF['Gene2'] trueEdges = trueEdges[trueEdges.isin(TrueEdgeDict)] - print("\nEdges considered ", len(trueEdges)) + # print("\nEdges considered ", len(trueEdges)) numEdges = len(trueEdges) predDF['Edges'] = predDF['Gene1'] + "|" + predDF['Gene2'] @@ -90,9 +95,12 @@ def EarlyPrec(evalObject, algorithmName, TFEdges = False): predDF = predDF[predDF['Edges'].isin(TrueEdgeDict)] else: + uniqueNodes = np.unique(trueEdgesDF.loc[:,['Gene1','Gene2']]) + possibleEdges = set(permutations(uniqueNodes, r = 2)) trueEdges = trueEdgesDF['Gene1'] + "|" + trueEdgesDF['Gene2'] trueEdges = set(trueEdges.values) numEdges = len(trueEdges) + # check if ranked edges list is empty # if so, it is just set to an empty set @@ -120,15 +128,20 @@ def EarlyPrec(evalObject, algorithmName, TFEdges = False): else: print("\nSkipping early precision computation for on path ", rank_path,"due to lack of predictions.") rankDict[dataset["name"]] = set([]) + Eprec = {} Erec = {} + EPR = {} for dataset in tqdm(evalObject.input_settings.datasets): if len(rankDict[dataset["name"]]) != 0: intersectionSet = rankDict[dataset["name"]].intersection(trueEdges) Eprec[dataset["name"]] = len(intersectionSet)/len(rankDict[dataset["name"]]) Erec[dataset["name"]] = len(intersectionSet)/len(trueEdges) + randomEprc = len(trueEdges) / len(possibleEdges) + EPR[dataset["name"]] = Eprec[dataset["name"]]/randomEprc else: Eprec[dataset["name"]] = 0 Erec[dataset["name"]] = 0 + EPR[dataset["name"]] = 0 - return(Eprec) + return Eprec, Erec, EPR \ No newline at end of file diff --git a/BLEval/computeSignedDGAUC.py b/BLEval/computeSignedDGAUC.py new file mode 100644 index 00000000..80f44ef0 --- /dev/null +++ b/BLEval/computeSignedDGAUC.py @@ -0,0 +1,279 @@ +import pandas as pd +import numpy as np +import seaborn as sns +from pathlib import Path +import matplotlib.pyplot as plt +import seaborn as sns +sns.set(rc={"lines.linewidth": 2}, palette = "deep", style = "ticks") +from sklearn.metrics import precision_recall_curve, roc_curve, average_precision_score, roc_auc_score +from sklearn.metrics import f1_score #, accuracy_score +from itertools import product, permutations, combinations, combinations_with_replacement +from tqdm import tqdm +from rpy2.robjects.packages import importr +from rpy2.robjects import FloatVector + + +def signedPRROC(dataDict, inputSettings, directed = True, selfEdges = False, plotFlag = False): + ''' + Computes areas under the precision-recall and ROC curves of activation edges and inhibitory edges + for a given dataset for each algorithm. + + + :param directed: A flag to indicate whether to treat predictions as directed edges (directed = True) or undirected edges (directed = False). + :type directed: bool + :param selfEdges: A flag to indicate whether to includeself-edges (selfEdges = True) or exclude self-edges (selfEdges = False) from evaluation. + :type selfEdges: bool + :param plotFlag: A flag to indicate whether or not to save PR and ROC plots. + :type plotFlag: bool + + :returns: + - AUPRC: A dictionary containing AUPRC values for each algorithm + - AUROC: A dictionary containing AUROC values for each algorithm + - AP: A dictionary containing AP values for each algorithm + - F1: A dictionary containing F1 values for each algorithm + ''' + + # Read file for trueEdges + trueEdgesDF = pd.read_csv(str(inputSettings.datadir)+'/'+ dataDict['name'] + + '/' +dataDict['trueEdges'], + sep = ',', + header = 0, index_col = None) + + # Initialize data dictionaries + precisionDict = {} + recallDict = {} + FPRDict = {} + TPRDict = {} + AUPRC = {} + AUROC = {} + AP = {} + F1 = {} + + # set-up outDir that stores output directory name + outDir = "outputs/"+str(inputSettings.datadir).split("inputs/")[1]+ '/' +dataDict['name'] + + # Obtation predicted dataframe for each algorithm + for algo in tqdm(inputSettings.algorithms, + total = len(inputSettings.algorithms), unit = " Algorithms"): + # check if the output rankedEdges file exists + if Path(outDir + '/' +algo[0]+'/rankedEdges.csv').exists(): + + # Initialize Precsion + predDF = pd.read_csv(outDir + '/' +algo[0]+'/rankedEdges.csv', \ + sep = '\t', header = 0, index_col = None) + + precisionDict[algo[0]], recallDict[algo[0]], FPRDict[algo[0]], TPRDict[algo[0]], AUPRC[algo[0]], AUROC[algo[0]], AP[algo[0]], F1[algo[0]] = signedComputeScores(trueEdgesDF, predDF, directed = directed, selfEdges = selfEdges) + else: + print(outDir + '/' +algo[0]+'/rankedEdges.csv', \ + ' does not exist. Skipping...') + + PRName = '/PRplot' + ROCName = '/ROCplot' + APName = '/APplot' + + + if (plotFlag): + ## Make PR curves + legendList = [] + for key in recallDict.keys(): + sns.lineplot(recallDict[key],precisionDict[key], ci=None) + legendList.append(key + ' (AUPRC = ' + str("%.2f" % (AUPRC[key]))+')') + plt.xlim(0,1) + plt.ylim(0,1) + plt.xlabel('Recall') + plt.ylabel('Precision') + plt.legend(legendList) + plt.savefig(outDir+PRName+'.pdf') + plt.savefig(outDir+PRName+'.png') + plt.clf() + + ## Make ROC curves + legendList = [] + for key in recallDict.keys(): + sns.lineplot(FPRDict[key],TPRDict[key], ci=None) + legendList.append(key + ' (AUROC = ' + str("%.2f" % (AUROC[key]))+')') + + plt.plot([0, 1], [0, 1], linewidth = 1.5, color = 'k', linestyle = '--') + + plt.xlim(0,1) + plt.ylim(0,1) + plt.xlabel('FPR') + plt.ylabel('TPR') + plt.legend(legendList) + plt.savefig(outDir+ROCName+'.pdf') + plt.savefig(outDir+ROCName+'.png') + plt.clf() + + ## Make AP curves + legendList = [] + for key in recallDict.keys(): + sns.lineplot(recallDict[key],precisionDict[key], ci=None) + legendList.append(key + ' (AP = ' + str("%.2f" % (AP[key]))+')') + plt.xlim(0,1) + plt.ylim(0,1) + plt.xlabel('Recall') + plt.ylabel('Precision') + plt.legend(legendList) + plt.savefig(outDir+APName+'.pdf') + plt.savefig(outDir+APName+'.png') + plt.clf() + + return AUPRC, AUROC, AP, F1 + + +def signedComputeScores(trueEdgesDF, predEdgeDF, + directed = True, selfEdges = True): + ''' + Computes precision-recall and ROC curves + using scikit-learn for a given set of predictions in the + form of a DataFrame. + + :param trueEdgesDF: A pandas dataframe containing the true classes.The indices of this dataframe are all possible edges in a graph formed using the genes in the given dataset. This dataframe only has one column to indicate the class label of an edge. If an edge is present in the reference network, it gets a class label of 1, else 0. + :type trueEdgesDF: DataFrame + + :param predEdgeDF: A pandas dataframe containing the edge ranks from the predicted network. The indices of this dataframe are all possible edges. This dataframe only has one column to indicate the edge weights in the predicted network. Higher the weight, higher the edge confidence. + :type predEdgeDF: DataFrame + + :param directed: A flag to indicate whether to treat predictionsas directed edges (directed = True) or undirected edges (directed = False). + :type directed: bool + + :param selfEdges: A flag to indicate whether to include self-edges (selfEdges = True) or exclude self-edges (selfEdges = False) from evaluation. + :type selfEdges: bool + + :returns: + - prec: A list of precision values (for PR plot) + - recall: A list of precision values (for PR plot) + - fpr: A list of false positive rates (for ROC plot) + - tpr: A list of true positive rates (for ROC plot) + - AUPRC: Area under the precision-recall curve + - AUROC: Area under the ROC curve + - AP: Average precision + - F1: F1 score + ''' + + # Create lists for possible directed and undirected edges + if directed: + if selfEdges: + possibleEdges = list(product(np.unique(trueEdgesDF.loc[:,['Gene1','Gene2']]), + repeat = 2)) + else: + # permutations of gene pairs + possibleEdges = list(permutations(np.unique(trueEdgesDF.loc[:,['Gene1','Gene2']]), + r = 2)) + else: + if selfEdges: + possibleEdges = list(combinations_with_replacement(np.unique(trueEdgesDF.loc[:,['Gene1','Gene2']]), + r = 2)) + else: + # combination of gene pairs + possibleEdges = list(combinations(np.unique(trueEdgesDF.loc[:,['Gene1','Gene2']]), r = 2)) + + # Determine if the prediction is signed using nonnegative edge weights + is_pred_signed = np.count_nonzero(predEdgeDF.EdgeWeight >= 0) != len(predEdgeDF) + outDFAll = {'+':{},'-':{}} + + # Consider signs + for sgn in ['+', "-"]: + # Initialize dictionaries with all possible edges + # Obtain the dictionary of gene pairs for true edges and predicted edges and assign 0 for all pairs + TrueEdgeDict = {'|'.join(p):0 for p in possibleEdges} + PredEdgeDict = {'|'.join(p):0 for p in possibleEdges} + + # Store edges with different sign + ignoredEdges = set() + + # Compute TrueEdgeDict Dictionary + # 1 if edge is present in the ground-truth + # 0 if edge is not present in the ground-truth + for edge in trueEdgesDF.itertuples(): + if edge.Gene1 == edge.Gene2: + continue + + if edge.Type == sgn: + if "|".join((edge.Gene1, edge.Gene2)) in TrueEdgeDict: + TrueEdgeDict["|".join((edge.Gene1, edge.Gene2))] = 1 + + if not directed: + if "|".join((edge.Gene2, edge.Gene1)) in TrueEdgeDict: + TrueEdgeDict["|".join((edge.Gene2, edge.Gene1))] = 1 + + else: + # Ignored edges not in ground-truth or with diffrent sign + ignoredEdges.add("|".join((edge.Gene1, edge.Gene2))) + + # Compute PredEdgeDict Dictionary + for edge in predEdgeDF.itertuples(): + if edge.Gene1 == edge.Gene2: + continue + + if is_pred_signed: + # Determine signs based on predicted edge weights + edge_sign = "+" if edge.EdgeWeight >= 0 else "-" + # Absolute value of predicted edge weight if edge is with the same sign of interest + if edge_sign == sgn: + if "|".join((edge.Gene1, edge.Gene2)) in PredEdgeDict: + PredEdgeDict["|".join((edge.Gene1, edge.Gene2))] = np.abs(edge.EdgeWeight) + + else: + if "|".join((edge.Gene1, edge.Gene2)) in ignoredEdges: + continue + # Assign absoulute values of edge weights to edges if the predicted edges are not signed + PredEdgeDict["|".join((edge.Gene1, edge.Gene2))] = np.abs(edge.EdgeWeight) + + outDF = pd.DataFrame([TrueEdgeDict,PredEdgeDict]).T + outDF.columns = ['TrueEdges','PredEdges'] + outDFAll[sgn] = outDF + + # Combine into one dataframe and pass to sklearn + prroc = importr('PRROC') + precAll = {'+':{},'-':{}} + recallAll = {'+':{},'-':{}} + fprAll = {'+':{},'-':{}} + tprAll = {'+':{},'-':{}} + auprcAll = {'+':{},'-':{}} + aurocAll = {'+':{},'-':{}} + apAll = {'+':{}, '-':{}} + f1All = {'+':{}, '-':{}} + + # Compute AUC by sign + for sgn in ['+','-']: + # precision, recall + prec, recall, thresholds = precision_recall_curve(y_true=outDFAll[sgn]['TrueEdges'], + probas_pred=outDFAll[sgn]['PredEdges'], + pos_label=1) + # FPR, TPR + fpr, tpr, thresholds = roc_curve(y_true=outDFAll[sgn]['TrueEdges'], + y_score=outDFAll[sgn]['PredEdges'], + pos_label=1) + # AUPRC + auprc = prroc.pr_curve(scores_class0 = FloatVector(list(outDFAll[sgn]['PredEdges'].values)), + weights_class0 = FloatVector(list(outDFAll[sgn]['TrueEdges'].values))) + # AUROC + auroc = roc_auc_score(y_true=outDFAll[sgn]['TrueEdges'], + y_score=outDFAll[sgn]['PredEdges']) + + # AP + ap = average_precision_score(y_true=outDFAll[sgn]['TrueEdges'], + y_score=outDFAll[sgn]['PredEdges'], + pos_label=1) + + # We use the weighted average on the data. + # Please refer to the selection of avaerage options: https://towardsdatascience.com/micro-macro-weighted-averages-of-f1-score-clearly-explained-b603420b292f + f1 = f1_score(y_true=outDF['TrueEdges'], + y_pred=outDF['PredEdges'].round(), + pos_label=1, average='weighted') + + # # The accuracy is the same as f1 with micro average + # accuracy = accuracy_score(y_true=outDF['TrueEdges'], + # y_pred=outDF['PredEdges'].round()) + + precAll[sgn] = prec + recallAll[sgn] = recall + fprAll[sgn] = fpr + tprAll[sgn] = tpr + auprcAll[sgn] = auprc[2][0] + aurocAll[sgn] = auroc + apAll[sgn] = ap + f1All[sgn] = f1 + + return precAll, recallAll, fprAll, tprAll, auprcAll, aurocAll, apAll, f1All diff --git a/BLRun/runner.py b/BLRun/runner.py index e28ec0c5..6e0d673e 100644 --- a/BLRun/runner.py +++ b/BLRun/runner.py @@ -12,10 +12,16 @@ import BLRun.singeRunner as SINGE import BLRun.scribeRunner as SCRIBE import BLRun.scsglRunner as SCSGL +import BLRun.scorpionRunner as SCORPION +import BLRun.sctenifoldnetRunner as SCTENIFOLDNET +import BLRun.ngsemRunner as NGSEM +import BLRun.tenetRunner as TENET +import BLRun.micaRunner as MICA from pathlib import Path -InputMapper = {'SCODE':SCODE.generateInputs, +InputMapper = { + 'SCODE':SCODE.generateInputs, 'SINCERITIES':SINCERITIES.generateInputs, 'SCNS':SCNS.generateInputs, 'PIDC':PIDC.generateInputs, @@ -28,12 +34,19 @@ 'GRISLI':GRISLI.generateInputs, 'SINGE':SINGE.generateInputs, 'SCRIBE':SCRIBE.generateInputs, - 'SCSGL':SCSGL.generateInputs} + 'SCSGL':SCSGL.generateInputs, + 'SCORPION': SCORPION.generateInputs, + 'SCTENIFOLDNET':SCTENIFOLDNET.generateInputs, + 'TENET':TENET.generateInputs, + 'NGSEM':NGSEM.generateInputs, + 'MICA':MICA.generateInputs + } -AlgorithmMapper = {'SCODE':SCODE.run, +AlgorithmMapper = { + 'SCODE':SCODE.run, 'SINCERITIES':SINCERITIES.run, 'SCNS':SCNS.run, 'PIDC':PIDC.run, @@ -46,11 +59,18 @@ 'GRISLI':GRISLI.run, 'SINGE':SINGE.run, 'SCRIBE':SCRIBE.run, - 'SCSGL':SCSGL.run} + 'SCSGL':SCSGL.run, + 'SCORPION': SCORPION.run, + 'SCTENIFOLDNET':SCTENIFOLDNET.run, + 'TENET':TENET.run, + 'NGSEM':NGSEM.run, + 'MICA':MICA.run + } -OutputParser = {'SCODE':SCODE.parseOutput, +OutputParser = { + 'SCODE':SCODE.parseOutput, 'SINCERITIES':SINCERITIES.parseOutput, 'SCNS':SCNS.parseOutput, 'PIDC':PIDC.parseOutput, @@ -63,7 +83,13 @@ 'GRISLI':GRISLI.parseOutput, 'SINGE':SINGE.parseOutput, 'SCRIBE':SCRIBE.parseOutput, - 'SCSGL':SCSGL.parseOutput} + 'SCSGL':SCSGL.parseOutput, + 'SCORPION': SCORPION.parseOutput, + 'SCTENIFOLDNET':SCTENIFOLDNET.parseOutput, + 'TENET':TENET.parseOutput, + 'NGSEM':NGSEM.parseOutput, + 'MICA':MICA.parseOutput + } class Runner(object): diff --git a/BLRun/scorpionRunner.py b/BLRun/scorpionRunner.py new file mode 100644 index 00000000..bbe7f58e --- /dev/null +++ b/BLRun/scorpionRunner.py @@ -0,0 +1,74 @@ +import os +import pandas as pd +from pathlib import Path +import numpy as np + +def generateInputs(RunnerObj): + ''' + Function to generate desired inputs for SCORPION. + If the folder/files under RunnerObj.datadir exist, + this function will not do anything. + ''' + if not RunnerObj.inputDir.joinpath("SCORPION").exists(): + print("Input folder for SCORPION does not exist, creating input folder...") + RunnerObj.inputDir.joinpath("SCORPION").mkdir(exist_ok = False) + + + if not RunnerObj.inputDir.joinpath("SCORPION/ExpressionData.csv").exists(): + ExpressionData = pd.read_csv(RunnerObj.inputDir.joinpath(RunnerObj.exprData), + header = 0, index_col = 0) + + newExpressionData = ExpressionData.copy() + + # Write .csv file + newExpressionData.to_csv(RunnerObj.inputDir.joinpath("SCORPION/ExpressionData.csv"), + sep = ',', header = True, index = True) + +def run(RunnerObj): + ''' + Function to run SCORPION algorithm + ''' + inputPath = "data" + str(RunnerObj.inputDir).split(str(Path.cwd()))[1] + \ + "/SCORPION/ExpressionData.csv" + + # make output dirs if they do not exist: + outDir = "outputs/"+str(RunnerObj.inputDir).split("inputs/")[1]+"/SCORPION/" + os.makedirs(outDir, exist_ok = True) + + outPath = "data/" + str(outDir) + 'outFile.txt' + cmdToRun = ' '.join(['docker run --rm -v', str(Path.cwd())+':/data/ scorpion:base /bin/sh -c \"time -v -o', "data/" + str(outDir) + 'time.txt', 'Rscript runSCORPION.R', + inputPath, outPath, '\"']) + print(cmdToRun) + os.system(cmdToRun) + + + +def parseOutput(RunnerObj): + ''' + Function to parse outputs from SCORPION. + ''' + # Quit if output directory does not exist + outDir = "outputs/"+str(RunnerObj.inputDir).split("inputs/")[1]+"/SCORPION/" + if not Path(outDir+'outFile.txt').exists(): + print(outDir+'outFile.txt'+'does not exist, skipping...') + return + + # Read output + OutDF = pd.read_csv(outDir+'outFile.txt', sep = '\t', header = 0) + # edges with significant p-value + # part1 = OutDF.loc[OutDF['pValue'] <= float(RunnerObj.params['pVal'])] + OutDF = OutDF.assign(absCorVal = OutDF['corVal'].abs()) + # edges without significant p-value + # part2 = OutDF.loc[OutDF['pValue'] > float(RunnerObj.params['pVal'])] + + outFile = open(outDir + 'rankedEdges.csv','w') + outFile.write('Gene1'+'\t'+'Gene2'+'\t'+'EdgeWeight'+'\n') + + for idx, row in OutDF.sort_values('absCorVal', ascending = False).iterrows(): + # outFile.write('\t'.join([row['Gene1'],row['Gene2'],str(row['corVal'])])+'\n') + outFile.write('\t'.join([str(row['Gene1']), str(row['Gene2']), str(row['corVal'])]) + '\n') + + + #for idx, row in part2.iterrows(): + # outFile.write('\t'.join([row['Gene1'],row['Gene2'],str(0)])+'\n') + outFile.close() diff --git a/BLRun/sctenifoldnetRunner.py b/BLRun/sctenifoldnetRunner.py new file mode 100644 index 00000000..b5f1b9e3 --- /dev/null +++ b/BLRun/sctenifoldnetRunner.py @@ -0,0 +1,72 @@ +import os +import pandas as pd +from pathlib import Path +import numpy as np + +def generateInputs(RunnerObj): + ''' + Function to generate desired inputs for PPCOR. + If the folder/files under RunnerObj.datadir exist, + this function will not do anything. + ''' + if not RunnerObj.inputDir.joinpath("SCTENIFOLDNET").exists(): + print("Input folder for SCTENIFOLDNET does not exist, creating input folder...") + RunnerObj.inputDir.joinpath("SCTENIFOLDNET").mkdir(exist_ok = False) + + if not RunnerObj.inputDir.joinpath("SCTENIFOLDNET/ExpressionData.csv").exists(): + ExpressionData = pd.read_csv(RunnerObj.inputDir.joinpath(RunnerObj.exprData), + header = 0, index_col = 0) + + newExpressionData = ExpressionData.copy() + + # Write .csv file + newExpressionData.to_csv(RunnerObj.inputDir.joinpath("SCTENIFOLDNET/ExpressionData.csv"), + sep = ',', header = True, index = True) + +def run(RunnerObj): + ''' + Function to run SCTENIFOLDNET algorithm + ''' + inputPath = "data" + str(RunnerObj.inputDir).split(str(Path.cwd()))[1] + \ + "/SCTENIFOLDNET/ExpressionData.csv" + + # make output dirs if they do not exist: + outDir = "outputs/"+str(RunnerObj.inputDir).split("inputs/")[1]+"/SCTENIFOLDNET/" + os.makedirs(outDir, exist_ok = True) + + outPath = "data/" + str(outDir) + 'outFile.txt' + cmdToRun = ' '.join(['docker run --rm -v', str(Path.cwd())+':/data/ sctenifoldnet:base /bin/sh -c \"time -v -o', "data/" + str(outDir) + 'time.txt', 'Rscript runSCTENIFOLDNET.R', + inputPath, outPath, '\"']) + print(cmdToRun) + os.system(cmdToRun) + + + +def parseOutput(RunnerObj): + ''' + Function to parse outputs from SCTENIFOLDNET. + ''' + # Quit if output directory does not exist + outDir = "outputs/"+str(RunnerObj.inputDir).split("inputs/")[1]+"/SCTENIFOLDNET/" + if not Path(outDir+'outFile.txt').exists(): + print(outDir+'outFile.txt'+'does not exist, skipping...') + return + + # Read output + OutDF = pd.read_csv(outDir+'outFile.txt', sep = '\t', header = 0) + # edges with significant p-value + # part1 = OutDF.loc[OutDF['pValue'] <= float(RunnerObj.params['pVal'])] + OutDF = OutDF.assign(absCorVal = OutDF['corVal'].abs()) + # edges without significant p-value + # part2 = OutDF.loc[OutDF['pValue'] > float(RunnerObj.params['pVal'])] + + outFile = open(outDir + 'rankedEdges.csv','w') + outFile.write('Gene1'+'\t'+'Gene2'+'\t'+'EdgeWeight'+'\n') + + for idx, row in OutDF.sort_values('absCorVal', ascending = False).iterrows(): + outFile.write('\t'.join([row['Gene1'],row['Gene2'],str(row['corVal'])])+'\n') + + #for idx, row in part2.iterrows(): + # outFile.write('\t'.join([row['Gene1'],row['Gene2'],str(0)])+'\n') + outFile.close() + diff --git a/config-files/scRNA-seq-generator/Nonspecific/hESC/hESC-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/hESC/hESC-TFs-1000-Generator.yaml new file mode 100644 index 00000000..c7a6745c --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/hESC/hESC-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: hESC/hESC-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-hESC-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Nonspecific/hESC/hESC-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/hESC/hESC-TFs-500-Generator.yaml new file mode 100644 index 00000000..40cc8476 --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/hESC/hESC-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: hESC/hESC-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-hESC-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Nonspecific/hHep/hHep-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/hHep/hHep-TFs-1000-Generator.yaml new file mode 100644 index 00000000..16fafc36 --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/hHep/hHep-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: hHep/hHep-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-hHep-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Nonspecific/hHep/hHep-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/hHep/hHep-TFs-500-Generator.yaml new file mode 100644 index 00000000..abc80641 --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/hHep/hHep-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: hHep/hHep-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-hHep-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Nonspecific/mDC/mDC-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/mDC/mDC-TFs-1000-Generator.yaml new file mode 100644 index 00000000..fc75d845 --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/mDC/mDC-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: mDC/mDC-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-mDC-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Nonspecific/mDC/mDC-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/mDC/mDC-TFs-500-Generator.yaml new file mode 100644 index 00000000..5d3b3e0a --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/mDC/mDC-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: mDC/mDC-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-mDC-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Nonspecific/mESC/mESC-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/mESC/mESC-TFs-1000-Generator.yaml new file mode 100644 index 00000000..faca75bc --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/mESC/mESC-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: mESC/mESC-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-mESC-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Nonspecific/mESC/mESC-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/mESC/mESC-TFs-500-Generator.yaml new file mode 100644 index 00000000..b308f774 --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/mESC/mESC-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: mESC/mESC-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-mESC-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Nonspecific/mHSC-E/mHSC-E-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/mHSC-E/mHSC-E-TFs-1000-Generator.yaml new file mode 100644 index 00000000..69e9d189 --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/mHSC-E/mHSC-E-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: mHSC-E/mHSC-E-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-mHSC-E-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Nonspecific/mHSC-E/mHSC-E-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/mHSC-E/mHSC-E-TFs-500-Generator.yaml new file mode 100644 index 00000000..89324fa1 --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/mHSC-E/mHSC-E-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: mHSC-E/mHSC-E-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-mHSC-E-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Nonspecific/mHSC-GM/mHSC-GM-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/mHSC-GM/mHSC-GM-TFs-1000-Generator.yaml new file mode 100644 index 00000000..85734103 --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/mHSC-GM/mHSC-GM-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: mHSC-GM/mHSC-GM-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-mHSC-GM-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Nonspecific/mHSC-GM/mHSC-GM-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/mHSC-GM/mHSC-GM-TFs-500-Generator.yaml new file mode 100644 index 00000000..c5486692 --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/mHSC-GM/mHSC-GM-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: mHSC-GM/mHSC-GM-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-mHSC-GM-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Nonspecific/mHSC-L/mHSC-L-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/mHSC-L/mHSC-L-TFs-1000-Generator.yaml new file mode 100644 index 00000000..342d3f78 --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/mHSC-L/mHSC-L-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: mHSC-L/mHSC-L-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-mHSC-L-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Nonspecific/mHSC-L/mHSC-L-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Nonspecific/mHSC-L/mHSC-L-TFs-500-Generator.yaml new file mode 100644 index 00000000..34d0ec28 --- /dev/null +++ b/config-files/scRNA-seq-generator/Nonspecific/mHSC-L/mHSC-L-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Nonspecific + datasets: + - name: mHSC-L/mHSC-L-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-mHSC-L-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/hESC/hESC-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Specific/hESC/hESC-TFs-1000-Generator.yaml new file mode 100644 index 00000000..d6e7f501 --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/hESC/hESC-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: hESC/hESC-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Specific-hESC-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/hESC/hESC-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Specific/hESC/hESC-TFs-500-Generator.yaml new file mode 100644 index 00000000..2b806200 --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/hESC/hESC-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: hESC/hESC-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Specific-hESC-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/hHep/hHep-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Specific/hHep/hHep-TFs-1000-Generator.yaml new file mode 100644 index 00000000..e4dde297 --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/hHep/hHep-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: hHep/hHep-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Specific-hHep-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/hHep/hHep-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Specific/hHep/hHep-TFs-500-Generator.yaml new file mode 100644 index 00000000..39b5c484 --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/hHep/hHep-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: hHep/hHep-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Specific-hHep-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/mDC/mDC-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Specific/mDC/mDC-TFs-1000-Generator.yaml new file mode 100644 index 00000000..f43975f0 --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/mDC/mDC-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: mDC/mDC-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Specific-mDC-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/mDC/mDC-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Specific/mDC/mDC-TFs-500-Generator.yaml new file mode 100644 index 00000000..7a1680cc --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/mDC/mDC-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: mDC/mDC-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Specific-mDC-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/mESC/mESC-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Specific/mESC/mESC-TFs-1000-Generator.yaml new file mode 100644 index 00000000..4c653099 --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/mESC/mESC-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: mESC/mESC-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Specific-mESC-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/mESC/mESC-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Specific/mESC/mESC-TFs-500-Generator.yaml new file mode 100644 index 00000000..88ceaad2 --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/mESC/mESC-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: mESC/mESC-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Specific-mESC-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/mHSC-E/mHSC-E-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Specific/mHSC-E/mHSC-E-TFs-1000-Generator.yaml new file mode 100644 index 00000000..dded5f9a --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/mHSC-E/mHSC-E-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: mHSC-E/mHSC-E-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Nonspecific-mHSC-E-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/mHSC-E/mHSC-E-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Specific/mHSC-E/mHSC-E-TFs-500-Generator.yaml new file mode 100644 index 00000000..10a79f27 --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/mHSC-E/mHSC-E-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: mHSC-E/mHSC-E-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Specific-mHSC-E-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/mHSC-GM/mHSC-GM-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Specific/mHSC-GM/mHSC-GM-TFs-1000-Generator.yaml new file mode 100644 index 00000000..effc10c2 --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/mHSC-GM/mHSC-GM-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: mHSC-GM/mHSC-GM-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Specific-mHSC-GM-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/mHSC-GM/mHSC-GM-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Specific/mHSC-GM/mHSC-GM-TFs-500-Generator.yaml new file mode 100644 index 00000000..561de731 --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/mHSC-GM/mHSC-GM-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: mHSC-GM/mHSC-GM-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Specific-mHSC-GM-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/mHSC-L/mHSC-L-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/Specific/mHSC-L/mHSC-L-TFs-1000-Generator.yaml new file mode 100644 index 00000000..123ccdba --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/mHSC-L/mHSC-L-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: mHSC-L/mHSC-L-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Specific-mHSC-L-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/Specific/mHSC-L/mHSC-L-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/Specific/mHSC-L/mHSC-L-TFs-500-Generator.yaml new file mode 100644 index 00000000..149cc331 --- /dev/null +++ b/config-files/scRNA-seq-generator/Specific/mHSC-L/mHSC-L-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/Specific + datasets: + - name: mHSC-L/mHSC-L-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: Specific-mHSC-L-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/hESC/hESC-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/String/hESC/hESC-TFs-1000-Generator.yaml new file mode 100644 index 00000000..fdd830b8 --- /dev/null +++ b/config-files/scRNA-seq-generator/String/hESC/hESC-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: hESC/hESC-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-hESC-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/hESC/hESC-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/String/hESC/hESC-TFs-500-Generator.yaml new file mode 100644 index 00000000..81a62557 --- /dev/null +++ b/config-files/scRNA-seq-generator/String/hESC/hESC-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: hESC/hESC-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-hESC-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/hHep/hHep-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/String/hHep/hHep-TFs-1000-Generator.yaml new file mode 100644 index 00000000..e3591f61 --- /dev/null +++ b/config-files/scRNA-seq-generator/String/hHep/hHep-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: hHep/hHep-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-hHep-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/hHep/hHep-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/String/hHep/hHep-TFs-500-Generator.yaml new file mode 100644 index 00000000..882baaeb --- /dev/null +++ b/config-files/scRNA-seq-generator/String/hHep/hHep-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: hHep/hHep-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-hHep-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/mDC/mDC-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/String/mDC/mDC-TFs-1000-Generator.yaml new file mode 100644 index 00000000..c5080fd0 --- /dev/null +++ b/config-files/scRNA-seq-generator/String/mDC/mDC-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: mDC/mDC-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-mDC-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/mDC/mDC-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/String/mDC/mDC-TFs-500-Generator.yaml new file mode 100644 index 00000000..4ad626e1 --- /dev/null +++ b/config-files/scRNA-seq-generator/String/mDC/mDC-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: mDC/mDC-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-mDC-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/mESC/mESC-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/String/mESC/mESC-TFs-1000-Generator.yaml new file mode 100644 index 00000000..0e1b2930 --- /dev/null +++ b/config-files/scRNA-seq-generator/String/mESC/mESC-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: mESC/mESC-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-mESC-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/mESC/mESC-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/String/mESC/mESC-TFs-500-Generator.yaml new file mode 100644 index 00000000..05a34660 --- /dev/null +++ b/config-files/scRNA-seq-generator/String/mESC/mESC-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: mESC/mESC-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-mESC-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/mHSC-E/mHSC-E-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/String/mHSC-E/mHSC-E-TFs-1000-Generator.yaml new file mode 100644 index 00000000..51d9da95 --- /dev/null +++ b/config-files/scRNA-seq-generator/String/mHSC-E/mHSC-E-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: mHSC-E/mHSC-E-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-mHSC-E-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/mHSC-E/mHSC-E-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/String/mHSC-E/mHSC-E-TFs-500-Generator.yaml new file mode 100644 index 00000000..69c19a53 --- /dev/null +++ b/config-files/scRNA-seq-generator/String/mHSC-E/mHSC-E-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: mHSC-E/mHSC-E-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-mHSC-E-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/mHSC-GM/mHSC-GM-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/String/mHSC-GM/mHSC-GM-TFs-1000-Generator.yaml new file mode 100644 index 00000000..2ece8130 --- /dev/null +++ b/config-files/scRNA-seq-generator/String/mHSC-GM/mHSC-GM-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: mHSC-GM/mHSC-GM-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-mHSC-GM-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/mHSC-GM/mHSC-GM-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/String/mHSC-GM/mHSC-GM-TFs-500-Generator.yaml new file mode 100644 index 00000000..71726af5 --- /dev/null +++ b/config-files/scRNA-seq-generator/String/mHSC-GM/mHSC-GM-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: mHSC-GM/mHSC-GM-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-mHSC-GM-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/mHSC-L/mHSC-L-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/String/mHSC-L/mHSC-L-TFs-1000-Generator.yaml new file mode 100644 index 00000000..e9d0d8e8 --- /dev/null +++ b/config-files/scRNA-seq-generator/String/mHSC-L/mHSC-L-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: mHSC-L/mHSC-L-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-mHSC-L-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/String/mHSC-L/mHSC-L-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/String/mHSC-L/mHSC-L-TFs-500-Generator.yaml new file mode 100644 index 00000000..dd773170 --- /dev/null +++ b/config-files/scRNA-seq-generator/String/mHSC-L/mHSC-L-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/String + datasets: + - name: mHSC-L/mHSC-L-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: String-mHSC-L-TFs-500 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/lofgof/mESC/mESC-TFs-1000-Generator.yaml b/config-files/scRNA-seq-generator/lofgof/mESC/mESC-TFs-1000-Generator.yaml new file mode 100644 index 00000000..bff6c385 --- /dev/null +++ b/config-files/scRNA-seq-generator/lofgof/mESC/mESC-TFs-1000-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/lofgof + datasets: + - name: mESC/mESC-TFs-1000-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: lofgof-mESC-TFs-1000 \ No newline at end of file diff --git a/config-files/scRNA-seq-generator/lofgof/mESC/mESC-TFs-500-Generator.yaml b/config-files/scRNA-seq-generator/lofgof/mESC/mESC-TFs-500-Generator.yaml new file mode 100644 index 00000000..8863dc2f --- /dev/null +++ b/config-files/scRNA-seq-generator/lofgof/mESC/mESC-TFs-500-Generator.yaml @@ -0,0 +1,15 @@ +input_settings: + + input_dir: inputs + dataset_dir: scRNA-seq/lofgof + datasets: + - name: mESC/mESC-TFs-500-Generator + cellData: PseudoTime.csv + exprData: ExpressionData.csv + trueEdges: refNetwork.csv + + +output_settings: + + output_dir: outputs + output_prefix: lofgof-mESC-TFs-500 \ No newline at end of file diff --git a/config-files/sfaira/sfaira-config.yaml b/config-files/sfaira/sfaira-config.yaml new file mode 100644 index 00000000..d18b2dee --- /dev/null +++ b/config-files/sfaira/sfaira-config.yaml @@ -0,0 +1,64 @@ +# sfaira Settings: specify the values of organism and organ to filter datasets from the sfaira database. +sfaira_settings: + + # Base directory used to store datasets + data_dir : "inputs/sfaira" + + + # Denotes a list of scRNA-seq datasets to download with the following parameters: + # Each filter is a key-value pair. + # As of April 1st, 2023, sfaira supports the following key-value pairs: + # key: the choices include year, organism, organ, and assay_sc. + # + # value: the specific values corresponds to the key. The values to each key can be more than one. + # Below are the choices are by key. + # - year: 2016, 2017, 2018, 2019, 2020, 2021, 2022 + # - organism: Ambystoma mexicanum, Anolis carolinensis, Canis lupus familiaris, Capra hircus, + # Felis catus, Homo sapiens, Macaca fascicularis, Mesocricetus auratus, Mus musculus, + # Mustela putorius furo, Oryctolagus cuniculus, Panthera tigris altaica + # - organ: adipose tissue, aorta, arcuate nucleus of hypothalamus, blood, bone marrow, brain, bronchus, + # caudate lobe of liver, cerebral cortex, cloaca, colon, colonic epithelium, dermis, diaphragm, + # dorsal plus ventral thalamus, esophagus, eye, forebrain, forelimb, gill, gonad, heart, hindbrain, + # hindlimb, ileum, intestine, islet of Langerhans, kidney, lamina propria of mucosa of colon, liver, + # lung, lung parenchyma, lymph node, mammary gland, mesenchyme, midbrain tegmentum, multicellular organism, + # muscle organ, neck, ovary, pallium, pancreas, placenta, prostate gland, rectum, retina, rib, + # skeletal muscle organ, skin epidermis, skin of body, small intestine, spleen, stomach, striatum, tail, + # testis, thymus, tongue, trachea, trophoblast, urinary bladder, uterus, vault of skull + # - assay_sc: 10x 3' transcription profiling, 10x 3' v1, 10x 3' v2, 10x 3' v3, 10x 5' transcription profiling, + # 10x multiome, 10x scATAC-seq, 10x technology, 10x transcription profiling, + # BD Rhapsody Whole Transcriptome Analysis, CEL-seq2, CITE-seq (cell surface protein profiling), + # CITE-seq (sample multiplexing), DroNc-seq, droplet-based single-cell RNA library preparation, + # Drop-seq, inDrop, MARS-seq, microwell-seq, Quartz-seq, sci-RNA-seq, sci-RNA-seq3, Seq-Well, + # Seq-Well S3, sfaira single cell library construction, single cell library construction, + # single-cell RNA sequencing, sNuc-Seq, Smart-seq2, SPLiT-seq + # + subsets: + + - filters: + organ: ["bone marrow"] + + - filters: + organism: ["Homo sapiens"] + organ: ["eye"] + + - filters: + organ: ["kidney"] + organism: ["Mus musculus"] + year: [2019] + + - filters: + organism: ["Mus musculus"] + year: [2019] + organ: ["heart"] + + - filters: + assay_sc: ["Drop-seq"] + organism: ["Homo sapiens"] + + - filters: + organ: ["lung", "liver"] + organism: ["Mus musculus"] + + - filters: + organ: ["retina"] + year: [2019, 2020, 2021] \ No newline at end of file diff --git a/docs/algorithms.rst b/docs/algorithms.rst index dcba59a9..e013c457 100644 --- a/docs/algorithms.rst +++ b/docs/algorithms.rst @@ -44,3 +44,11 @@ The following table lists the algorithms and the parameters they take as input, +----------------+--------------------------------------------------------------------------------------------+ | GRNBOOST2 | None | +----------------+--------------------------------------------------------------------------------------------+ +| SCSGL | - ``pos_density`` : (Default = 0.45) | +| | - ``neg_density`` : (Default = 0.45) | +| | - ``assoc`` : (Default = 'correlation') | ++----------------+--------------------------------------------------------------------------------------------+ +| SCTENIFOLDNET | None | ++----------------+--------------------------------------------------------------------------------------------+ +| SCORPION | None | ++----------------+--------------------------------------------------------------------------------------------+ \ No newline at end of file diff --git a/docs/beeline-developer.rst b/docs/beeline-developer.rst index 197ccda9..867b23cd 100644 --- a/docs/beeline-developer.rst +++ b/docs/beeline-developer.rst @@ -81,3 +81,30 @@ BEELINE currently supports several evaluation techniques, namely, area under ROC 3. The final step is to add a command line option to perform the evaluation to `BLEvaluator.py `_. +.. _blevalguide: + +Adding a new experimental scRNA-seq data from sfaira +################################# + +BEELINE provides seven experimental scRNA-seq datasets for evaluation. Sfaira is a standardized framework for sharing and accessing scRNA-seq datasets from various species, tissues, and experimental conditions. To extend BEELINE to analyze additional datasets, we further integrated sfaira to enable the automatic downloading of datasets according to prespecified features from sfaira. You can specify the desired features including year, organism, organ, and assay_sc in config-files/sfaira/sfaira-config.yaml. Then run the following code to download experimental scRNA-seq datasets into the existing pipeline. + +.. code:: python + + python BLDataloader.py --config config-files/sfaira/sfaira-config.yaml + + +.. _blevalguide: + +Generating expression inputs and reference networks for a new experimental scRNA-seq dataset +################################# + +BEELINE provides the data files, including ExpressionData.csv, GeneOrdering.csv, and PseudoTime.csv for seven experimental scRNA-seq datasets. We also offer the option to generate the necessary expression inputs and reference networks by certain ground truth data to accommodate new datasets. + +1. Generate expression inputs: use generateExpInputs.py to produce expression inputs for the new dataset. Below is an example of generating the expression inputs with all transcription factors and 500 most varying genes for mHSC dataset based on STRING network: + +.. code:: python + +python generateExpInputs.py -e=inputs/BEELINE-data/scRNA-Seq/Raw-data/mHSC-E/ExpressionData.csv -g=inputs/BEELINE-data/scRNA-Seq/Raw-data/mHSC-E/GeneOrdering.csv -f=inputs/BEELINE-Networks/Networks/mouse/STRING-network.csv -i=inputs/BEELINE-Networks/mouse-tfs.csv -p=0.01 -c -t -n=500 + +2. Generate reference networks: use generateRefNetworks.py to create reference networks that will be used as ground truth to evaluate the above generated dataset. You can obtain the reference networks by referring to configure files in config-files +/scRNA-seq-generator and changing the values of each argument in generateRefNetworks.sh. \ No newline at end of file diff --git a/generateRefNetworks.py b/generateRefNetworks.py new file mode 100644 index 00000000..874e1d99 --- /dev/null +++ b/generateRefNetworks.py @@ -0,0 +1,227 @@ +#!/usr/bin/env python + +# Script to limit the ground truth network to the genes which are in the +# expression data file, and evaluate + + +import os +import yaml +import argparse +import pandas as pd +#import run_eval_algs +#import BLEvalAggregator as BLeval +import bench as bench + + +def main(config_map, opts): + config_map = config_map.copy() + input_settings = config_map['input_settings'] + out_settings = config_map['output_settings'] + datasets = input_settings['datasets'] + input_dir = "%s/%s" % (input_settings['input_dir'], input_settings['dataset_dir']) + # algs = input_settings['algorithms'] + # if opts.alg is not None: + # # make the alg names lower so capitalization won't make a difference + # opts.algs = [a.lower() for a in opts.alg] + # new_alg_settings = [] + # #for alg in opts.alg: + # # # set 'should_run' to True for the algs specified + # # algdict = {'name': alg, 'params': {'should_run': [True]}} + # # new_alg_settings.append(algdict) + # for alg in algs: + # if alg['name'].lower() in opts.algs: + # print('Keeping %s in the new config files' % (alg)) + # else: + # continue + # # set 'should_run' to True for the algs specified + # alg['params']['should_run'] = [True] + # new_alg_settings.append(alg) + # input_settings['algorithms'] = new_alg_settings + + # print(input_settings['algorithms']) + + for dataset in datasets: + # first load ExpressionData.csv + name = dataset['name'] + dataset_dir = "%s/%s" % (input_dir, name) + print("\nWorking on %s" % (dataset_dir)) + expr_file = "%s/%s" % (dataset_dir, dataset['exprData']) + print("\treading %s" % (expr_file)) + expr_df = pd.read_csv(expr_file, header= 0, index_col=0) + + # now load the network file + net_file = "%s/%s" % (dataset_dir, dataset['trueEdges']) + print("\treading %s" % (opts.ref_net_file)) + net_df = pd.read_csv(opts.ref_net_file, header=0) + net_df.columns = ["Gene1","Gene2"] + list(net_df.columns[2:]) + net_tfs = net_df['Gene1'].values + num_tfs, num_targets = net_df[['Gene1','Gene2']].nunique() + print("\t%d TFs, %d targets, %d edges" % (num_tfs, num_targets, len(net_df))) + + expr_genes = set(expr_df.index.values) + net_df = net_df[(net_df['Gene1'].isin(expr_genes) & net_df['Gene2'].isin(expr_genes))] + if len(net_df) == 0: + print("No matching node names found. Please make sure the same namespace is used.") + print("\tExample expr node: %s" % (list(expr_genes)[0])) + print("\tExample net node: %s" % (net_tfs[0])) + else: + # print("After limitting to the %d genes with expression values:" % (len(expr_genes))) + num_tfs, num_targets = net_df[['Gene1','Gene2']].nunique() + print("\t# TFs\t# targets\t# edges") + print("\t%s\t%s\t%d" % (num_tfs, num_targets, len(net_df))) + # and write it to a file + print("\nwriting %s" % (net_file)) + net_df.to_csv(net_file, index=False) + if opts.stats_only: + continue + + # don't need to write the yaml file + # add an option to write it? + # can simply pass it to BLEvalAggregator.py + # print("Running BLEvalAggregator.py") + # bench.main(config_map, opts) + + # skip the rest of this for now + continue + # after its done, need to move the evaluation file + # otherwise it will be overwritten by the next run + # alternatively we could change the output directory in the config map + + net_name = opts.net_name if opts.net_name is not None else opts.ref_net_file.split('/')[-1].replace('.csv','') + out_file = "%s/eval.csv" % (input_dir.replace("inputs/","outputs/")) + all_df = pd.DataFrame() + #for measure in ["AUPRC", "AUROC", "EPr", "Jaccard", "Times"]: + for measure in ["AUPRC", "AUROC", "EPr", "Times"]: + measure_file = "%s/%s-%s.csv" % ( + input_dir.replace("inputs/","outputs/"), out_settings['output_prefix'], measure) + df = pd.read_csv(measure_file, header=0) + print(df) + df.columns = ['algorithm', 'value'] + df['measure'] = measure + df['dataset'] = dataset['name'] + df['ref_net'] = net_name + all_df = pd.concat([all_df, df]) + # delete this file + os.remove(measure_file) + + # now append this to a file + header = True + append = True + if os.path.isfile(out_file): + if forced: + append = False + print("writing to %s" % (out_file)) + else: + print("appending to %s" % (out_file)) + #header = False + # make sure we don't duplicate any rows + df = pd.read_csv(out_file, header = 0) + all_df = pd.concat([df, all_df]) + # if the new values are already in the df, don't repeat them again + all_df.drop_duplicates(inplace=True) + # # if the new values are different, overwrite what was in the file with the new results + # all_df.drop_duplicates(subset=["algorithm", "measure", "dataset", "ref_net"], keep='last', inplace=True) + else: + print("writing to %s" % (out_file)) + + #with open(out_file, 'a' if append else 'w') as out: + with open(out_file, 'w') as out: + # lock it to avoid scripts trying to write at the same time + #fcntl.flock(out, fcntl.LOCK_EX) + all_df.to_csv(out, header=header, index=False) + #fcntl.flock(out, fcntl.LOCK_UN) + + print("Finished") + + +def write_yaml_file(yaml_file, config_map): + print("\twriting to %s" % (yaml_file)) + with open(yaml_file, 'w') as out: + yaml.dump(config_map, out, default_flow_style=False) + + +def setup_parser(): + #parser = argparse.ArgumentParser( + # description='Script for setting up various experiments ') + # also add the BLEval options + parser = argparse.ArgumentParser( + description='Run pathway reconstruction pipeline.') + + parser.add_argument('-c','--config', default='config.yaml', + help="Configuration file containing list of datasets " + "algorithms and output specifications.\n") + + parser.add_argument('-a', '--auc', action="store_true", default=False, + help="Compute median of areas under Precision-Recall and ROC curves.\n") + + parser.add_argument('-j', '--jaccard', action="store_true", default=False, + help="Compute median Jaccard index of predicted top-k networks " + "for each algorithm for a given set of datasets generated " + "from the same ground truth network.\n") + + parser.add_argument('-r', '--spearman', action="store_true", default=False, + help="Compute median Spearman Corr. of predicted edges " + "for each algorithm for a given set of datasets generated " + " from the same ground truth network.\n") + + parser.add_argument('-t', '--time', action="store_true", default=False, + help="Analyze time taken by each algorithm for a.\n") + + parser.add_argument('-e', '--epr', action="store_true", default=False, + help="Compute median early precision.") + + parser.add_argument('-s','--sepr', action="store_true", default=False, + help="Analyze median (signed) early precision for activation and inhibitory edges.") + + parser.add_argument('-m','--motifs', action="store_true", default=False, + help="Compute network motifs in the predicted top-k networks.") + + #parser.add_argument('--config', default='config.yaml', required=True, + # help='Configuration file') + #parser.add_argument('--run-algs', action="store_true", default=False, + # help='Run the methods using the generated config file') + # parser.add_argument('--alg', action="append", + # help="Name of algorithm to run. Must match the output file path. May specify multiple. Default is whatever is set to true in the config file") + parser.add_argument('--ref-net-file', type=str, default="GeneOrdering.csv", + help='Path to the ground truth refNetwork.csv file. A new file will be subset to the genes in the ExpressionData.csv and written.') + parser.add_argument('--tfs', action="store_true", default=False, + help="Only consider edges from TF to gene.") + parser.add_argument('--net-name', + help='The name to give this network for evaluating. Default is the file name.') + parser.add_argument('--stats-only', action="store_true", default=False, + help='Only print out the stats of the # edges and such') + parser.add_argument('--eval-only', action="store_true", default=False, + help='Only evaluate. Used for bench.py') + parser.add_argument('--postfix', default='', + help='postfix for output evaluation files') + parser.add_argument('--force-eval', action='store_true', default=False, + help='If the eval.csv file exists, overwite it instead of adding to it') + + ## most variable genes options + #parser.add_argument('--most-variable-genes', '-V', action="store_true", default=False, + # help='Select the most variable genes and subset the Expression Data.csv and refNetwork.csv to those genes') + #parser.add_argument('--gene-order-file', type=str, default="GeneOrdering.csv", + # help='Name of CSV file with the ascending ordering value in the second column. ' + + # 'Should be the same for each dataset. Suggested: GeneOrdering.csv.') + # TODO specify multiple? + #parser.add_argument('--pval-cutoff', type=float, + # help='Cutoff of the pvalue to select genes') + # TODO specify multiple? + #parser.add_argument('--num-genes', type=int, default=100, + # help='Number of genes to subset. Default: 100') + #parser.add_argument('--forced', action="store_true", default=False, + # help='Overwrite the ExpressionData.csv file if it already exists.') + + return parser + + +if __name__ == "__main__": + parser = setup_parser() + opts = parser.parse_args() + # BLEval takes the opts, so keep it as opts + #kwargs = vars(opts) + config_file = opts.config + with open(config_file, 'r') as conf: + config_map = yaml.safe_load(conf) + + main(config_map, opts) diff --git a/generateRefNetworks.sh b/generateRefNetworks.sh new file mode 100644 index 00000000..4427b006 --- /dev/null +++ b/generateRefNetworks.sh @@ -0,0 +1,214 @@ +# This script is to generate expression inputs and corresponding reference network for real data by ground truth networks. + +current_net_path="inputs/BEELINE-Networks/Networks" +python="python" + +declare -a num_genes_list=( +"TFs-500" +"TFs-1000" +) + +# algs="--alg PIDC" #--alg GRNBOOST2 --alg GENIE3 --alg SCSGL --alg SCTENIFOLDNET --alg SCORPION" + +# #-------------------------------- Non-specific ground-truth network --------------------------------# +# # human datasets with gene names +# declare -a datasets=("hESC" "hHep") +# declare -a networks=( +# "${current_net_path}/human/Non-Specific-ChIP-seq-network.csv" +# ) + +# for dataset in ${datasets[*]}; do +# for num_genes in ${num_genes_list[*]}; do +# for net in ${networks[*]}; do +# config_file="config-files/scRNA-seq/Nonspecific/$dataset/${dataset}-${num_genes}-Generator.yaml" +# echo "----------------------------------------------------------------------------------------------------" +# echo "$python -u generateRefNetworks.py --config $config_file --ref-net-file $net" +# $python -u generateRefNetworks.py --config $config_file --ref-net-file $net +# done +# done +# done + +# # mouse datasets with gene names +# declare -a datasets=("mDC" "mESC" "mHSC-E" "mHSC-GM" "mHSC-L") +# declare -a networks=( +# "${current_net_path}/mouse/Non-Specific-ChIP-seq-network.csv" +# ) + +# for dataset in ${datasets[*]}; do +# for num_genes in ${num_genes_list[*]}; do +# for net in ${networks[*]}; do +# config_file="config-files/scRNA-seq/Nonspecific/$dataset/${dataset}-${num_genes}-Generator.yaml" +# echo "----------------------------------------------------------------------------------------------------" +# echo "$python -u generateRefNetworks.py --config $config_file --ref-net-file $net" +# $python -u generateRefNetworks.py --config $config_file --ref-net-file $net +# done +# done +# done + +# #-------------------------------- String ground-truth network --------------------------------# +# # human datasets with gene names +# declare -a datasets=("hESC" "hHep") +# declare -a networks=( +# "${current_net_path}/human/STRING-network.csv" +# ) + +# for dataset in ${datasets[*]}; do +# for num_genes in ${num_genes_list[*]}; do +# for net in ${networks[*]}; do +# config_file="config-files/scRNA-seq/String/$dataset/${dataset}-${num_genes}-Generator.yaml" +# echo "----------------------------------------------------------------------------------------------------" +# echo "$python -u generateRefNetworks.py --config $config_file --ref-net-file $net" +# $python -u generateRefNetworks.py --config $config_file --ref-net-file $net +# done +# done +# done + +# # mouse datasets with gene names +# declare -a datasets=("mDC" "mESC" "mHSC-E" "mHSC-GM" "mHSC-L") +# declare -a networks=( +# "${current_net_path}/mouse/STRING-network.csv" +# ) + +# for dataset in ${datasets[*]}; do +# for num_genes in ${num_genes_list[*]}; do +# for net in ${networks[*]}; do +# config_file="config-files/scRNA-seq/String/$dataset/${dataset}-${num_genes}-Generator.yaml" +# echo "----------------------------------------------------------------------------------------------------" +# echo "$python -u generateRefNetworks.py --config $config_file --ref-net-file $net" +# $python -u generateRefNetworks.py --config $config_file --ref-net-file $net +# done +# done +# done + + +#-------------------------------- Cell type-specific ground-truth network --------------------------------# +# # human datasets with gene names +# declare -a datasets=("hESC") +# declare -a networks=( +# "${current_net_path}/human/hESC-ChIP-seq-network.csv" +# ) + +# for dataset in ${datasets[*]}; do +# for num_genes in ${num_genes_list[*]}; do +# for net in ${networks[*]}; do +# config_file="config-files/scRNA-seq/Specific/$dataset/${dataset}-${num_genes}-Generator.yaml" +# echo "----------------------------------------------------------------------------------------------------" +# echo "$python -u generateRefNetworks.py --config $config_file --ref-net-file $net" +# $python -u generateRefNetworks.py --config $config_file --ref-net-file $net +# done +# done +# done + +# declare -a datasets=("hHep") +# declare -a networks=( +# "${current_net_path}/human/HepG2-ChIP-seq-network.csv" +# ) + +# for dataset in ${datasets[*]}; do +# for num_genes in ${num_genes_list[*]}; do +# for net in ${networks[*]}; do +# config_file="config-files/scRNA-seq/Specific/$dataset/${dataset}-${num_genes}-Generator.yaml" +# echo "----------------------------------------------------------------------------------------------------" +# echo "$python -u generateRefNetworks.py --config $config_file --ref-net-file $net" +# $python -u generateRefNetworks.py --config $config_file --ref-net-file $net +# done +# done +# done + +# # mouse datasets with gene names +# declare -a datasets=("mDC") +# declare -a networks=( +# "${current_net_path}/mouse/mDC-ChIP-seq-network.csv" +# ) + +# for dataset in ${datasets[*]}; do +# for num_genes in ${num_genes_list[*]}; do +# for net in ${networks[*]}; do +# config_file="config-files/scRNA-seq/Specific/$dataset/${dataset}-${num_genes}-Generator.yaml" +# echo "----------------------------------------------------------------------------------------------------" +# echo "$python -u generateRefNetworks.py --config $config_file --ref-net-file $net" +# $python -u generateRefNetworks.py --config $config_file --ref-net-file $net +# done +# done +# done + +# declare -a datasets=("mESC") +# declare -a networks=( +# "${current_net_path}/mouse/mESC-ChIP-seq-network.csv" +# ) + +# for dataset in ${datasets[*]}; do +# for num_genes in ${num_genes_list[*]}; do +# for net in ${networks[*]}; do +# config_file="config-files/scRNA-seq/Specific/$dataset/${dataset}-${num_genes}-Generator.yaml" +# echo "----------------------------------------------------------------------------------------------------" +# echo "$python -u generateRefNetworks.py --config $config_file --ref-net-file $net" +# $python -u generateRefNetworks.py --config $config_file --ref-net-file $net +# done +# done +# done + + +declare -a datasets=("mESC") +declare -a networks=( + "${current_net_path}/mouse/mESC-lofgof-network.csv" + ) + +for dataset in ${datasets[*]}; do + for num_genes in ${num_genes_list[*]}; do + for net in ${networks[*]}; do + config_file="config-files/scRNA-seq/lofgof/$dataset/${dataset}-${num_genes}-Generator.yaml" + echo "----------------------------------------------------------------------------------------------------" + echo "$python -u generateRefNetworks.py --config $config_file --ref-net-file $net" + $python -u generateRefNetworks.py --config $config_file --ref-net-file $net + done + done +done + +# declare -a datasets=("mHSC-E") +# declare -a networks=( +# "${current_net_path}/mouse/mHSC-ChIP-seq-network.csv" +# ) + +# for dataset in ${datasets[*]}; do +# for num_genes in ${num_genes_list[*]}; do +# for net in ${networks[*]}; do +# config_file="config-files/scRNA-seq/Specific/$dataset/${dataset}-${num_genes}-Generator.yaml" +# echo "----------------------------------------------------------------------------------------------------" +# echo "$python -u generateRefNetworks.py --config $config_file --ref-net-file $net" +# $python -u generateRefNetworks.py --config $config_file --ref-net-file $net +# done +# done +# done + +# declare -a datasets=("mHSC-GM") +# declare -a networks=( +# "${current_net_path}/mouse/mHSC-ChIP-seq-network.csv" +# ) + +# for dataset in ${datasets[*]}; do +# for num_genes in ${num_genes_list[*]}; do +# for net in ${networks[*]}; do +# config_file="config-files/scRNA-seq/Specific/$dataset/${dataset}-${num_genes}-Generator.yaml" +# echo "----------------------------------------------------------------------------------------------------" +# echo "$python -u generateRefNetworks.py --config $config_file --ref-net-file $net" +# $python -u generateRefNetworks.py --config $config_file --ref-net-file $net +# done +# done +# done + +# declare -a datasets=("mHSC-L") +# declare -a networks=( +# "${current_net_path}/mouse/mHSC-ChIP-seq-network.csv" +# ) + +# for dataset in ${datasets[*]}; do +# for num_genes in ${num_genes_list[*]}; do +# for net in ${networks[*]}; do +# config_file="config-files/scRNA-seq/Specific/$dataset/${dataset}-${num_genes}-Generator.yaml" +# echo "----------------------------------------------------------------------------------------------------" +# echo "$python -u generateRefNetworks.py --config $config_file --ref-net-file $net" +# $python -u generateRefNetworks.py --config $config_file --ref-net-file $net +# done +# done +# done \ No newline at end of file diff --git a/initialize.sh b/initialize.sh old mode 100755 new mode 100644 index e3f6ae9e..ebcb0850 --- a/initialize.sh +++ b/initialize.sh @@ -8,10 +8,8 @@ BASEDIR=$(pwd) # You may remove the -q flag if you want to see the docker build status cd $BASEDIR/Algorithms/ARBORETO docker build -q -t arboreto:base . -if ([ $? = 0 ] && [ "$(docker images -q arboreto:base 2> /dev/null)" != "" ]); then +if [[ "$(docker images -q arboreto:base 2> /dev/null)" != "" ]]; then echo "Docker container for ARBORETO is built and tagged as arboreto:base" -elif [ "$(docker images -q arboreto:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at arboreto:base" else echo "Oops! Unable to build Docker container for ARBORETO" fi @@ -19,10 +17,8 @@ fi cd $BASEDIR/Algorithms/GRISLI/ docker build -q -t grisli:base . -if ([ $? = 0 ] && [ "$(docker images -q grisli:base 2> /dev/null)" != "" ]); then +if [[ "$(docker images -q grisli:base 2> /dev/null)" != "" ]]; then echo "Docker container for GRISLI is built and tagged as grisli:base" -elif [ "$(docker images -q grisli:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at grisli:base" else echo "Oops! Unable to build Docker container for GRISLI" fi @@ -30,10 +26,8 @@ fi cd $BASEDIR/Algorithms/GRNVBEM/ docker build -q -t grnvbem:base . -if ([ $? = 0 ] && [[ "$(docker images -q grnvbem:base 2> /dev/null)" != "" ]]); then +if ([[ "$(docker images -q grnvbem:base 2> /dev/null)" != "" ]]); then echo "Docker container for GRNVBEM is built and tagged as grnvbem:base" -elif [ "$(docker images -q grnvbem:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at grnvbem:base" else echo "Oops! Unable to build Docker container for GRNVBEM" fi @@ -41,10 +35,8 @@ fi cd $BASEDIR/Algorithms/JUMP3/ docker build -q -t jump3:base . -if ([ $? = 0 ] && [[ "$(docker images -q jump3:base 2> /dev/null)" != "" ]]); then +if ([[ "$(docker images -q jump3:base 2> /dev/null)" != "" ]]); then echo "Docker container for JUMP3 is built and tagged as jump3:base" -elif [ "$(docker images -q jump3:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at jump3:base" else echo "Oops! Unable to build Docker container for JUMP3" fi @@ -52,10 +44,8 @@ fi cd $BASEDIR/Algorithms/LEAP/ docker build -q -t leap:base . -if ([ $? = 0 ] && [[ "$(docker images -q leap:base 2> /dev/null)" != "" ]]); then +if ([[ "$(docker images -q leap:base 2> /dev/null)" != "" ]]); then echo "Docker container for LEAP is built and tagged as leap:base" -elif [ "$(docker images -q leap:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at leap:base" else echo "Oops! Unable to build Docker container for LEAP" fi @@ -63,10 +53,8 @@ fi cd $BASEDIR/Algorithms/PIDC/ docker build -q -t pidc:base . -if ([ $? = 0 ] && [[ "$(docker images -q pidc:base 2> /dev/null)" != "" ]]); then +if ([[ "$(docker images -q pidc:base 2> /dev/null)" != "" ]]); then echo "Docker container for PIDC is built and tagged as pidc:base" -elif [ "$(docker images -q pidc:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at pidc:base" else echo "Oops! Unable to build Docker container for PIDC" fi @@ -74,10 +62,8 @@ fi cd $BASEDIR/Algorithms/PNI/ docker build -q -t pni:base . -if ([ $? = 0 ] && [[ "$(docker images -q pni:base 2> /dev/null)" != "" ]]); then +if ([[ "$(docker images -q pni:base 2> /dev/null)" != "" ]]); then echo "Docker container for PNI is built and tagged as pni:base" -elif [ "$(docker images -q pni:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at pni:base" else echo "Oops! Unable to build Docker container for PNI" fi @@ -85,10 +71,8 @@ fi cd $BASEDIR/Algorithms/PPCOR/ docker build -q -t ppcor:base . -if ([ $? = 0 ] && [[ "$(docker images -q ppcor:base 2> /dev/null)" != "" ]]); then +if ([[ "$(docker images -q ppcor:base 2> /dev/null)" != "" ]]); then echo "Docker container for PPCOR is built and tagged as ppcor:base" -elif [ "$(docker images -q ppcor:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at ppcor:base" else echo "Oops! Unable to build Docker container for PPCOR" fi @@ -96,10 +80,8 @@ fi cd $BASEDIR/Algorithms/SINGE/ docker build -q -t singe:base . -if ([ $? = 0 ] && [[ "$(docker images -q singe:base 2> /dev/null)" != "" ]]); then +if ([[ "$(docker images -q singe:base 2> /dev/null)" != "" ]]); then echo "Docker container for SINGE is built and tagged as singe:base" -elif [ "$(docker images -q singe:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at singe:base" else echo "Oops! Unable to build Docker container for SINGE" fi @@ -107,10 +89,8 @@ fi cd $BASEDIR/Algorithms/SCNS/ docker build -q -t scns:base . -if ([ $? = 0 ] && [[ "$(docker images -q scns:base 2> /dev/null)" != "" ]]); then +if ([[ "$(docker images -q scns:base 2> /dev/null)" != "" ]]); then echo "Docker container for SCNS is built and tagged as scns:base" -elif [ "$(docker images -q scns:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at scns:base" else echo "Oops! Unable to build Docker container for SCNS" fi @@ -118,10 +98,8 @@ fi cd $BASEDIR/Algorithms/SCODE/ docker build -q -t scode:base . -if ([ $? = 0 ] && [[ "$(docker images -q scode:base 2> /dev/null)" != "" ]]); then +if ([[ "$(docker images -q scode:base 2> /dev/null)" != "" ]]); then echo "Docker container for SCODE is built and tagged as scode:base" -elif [ "$(docker images -q scode:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at scode:base" else echo "Oops! Unable to build Docker container for SCODE" fi @@ -129,10 +107,8 @@ fi cd $BASEDIR/Algorithms/SCRIBE/ docker build -q -t scribe:base . -if ([ $? = 0 ] && [[ "$(docker images -q scribe:base 2> /dev/null)" != "" ]]); then +if [[ "$(docker images -q scribe:base 2> /dev/null)" != "" ]]; then echo "Docker container for SCRIBE is built and tagged as scribe:base" -elif [ "$(docker images -q scribe:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at scribe:base" else echo "Oops! Unable to build Docker container for SCRIBE" fi @@ -140,10 +116,8 @@ fi cd $BASEDIR/Algorithms/SINCERITIES/ docker build -q -t sincerities:base . -if ([ $? = 0 ] && [ "$(docker images -q sincerities:base 2> /dev/null)" != "" ]); then +if ([[ "$(docker images -q sincerities:base 2> /dev/null)" != "" ]]); then echo "Docker container for SINCERITIES is built and tagged as sincerities:base" -elif [ "$(docker images -q sincerities:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at sincerities:base" else echo "Oops! Unable to build Docker container for SINCERITIES" fi @@ -151,12 +125,57 @@ fi cd $BASEDIR/Algorithms/SCSGL/ docker build -q -t scsgl:base . -if ([ $? = 0 ] && [[ "$(docker images -q scsgl:base 2> /dev/null)" != "" ]]); then +if ([[ "$(docker images -q scsgl:base 2> /dev/null)" != "" ]]); then echo "Docker container for SCSGL is built and tagged as scsgl:base" -elif [ "$(docker images -q scsgl:base 2> /dev/null)" != "" ]; then - echo "Docker container failed to build, but an existing image exists at scsgl:base" else echo "Oops! Unable to build Docker container for SCSGL" fi + +cd $BASEDIR/Algorithms/SCTENIFOLDNET/ +docker build -q -t sctenifoldnet:base . +if ([[ "$(docker images -q sctenifoldnet:base 2> /dev/null)" != "" ]]); then + echo "Docker container for SCTENIFOLDNET is built and tagged as sctenifoldnet:base" +else + echo "Oops! Unable to build Docker container for SCTENIFOLDNET" +fi + + +cd $BASEDIR/Algorithms/SCORPION/ +docker build -q -t scorpion:base . +if ([[ "$(docker images -q scorpion:base 2> /dev/null)" != "" ]]); then + echo "Docker container for SCORPION is built and tagged as scorpion:base" +else + echo "Oops! Unable to build Docker container for SCORPION" +fi + + +cd $BASEDIR/Algorithms/MICA/ +docker build -q -t mica:base . +if ([[ "$(docker images -q mica:base 2> /dev/null)" != "" ]]); then + echo "Docker container for MICA is built and tagged as mica:base" +else + echo "Oops! Unable to build Docker container for MICA" +fi + + +cd $BASEDIR/Algorithms/TENET/ +docker build -q -t tenet:base . +if ([[ "$(docker images -q tenet:base 2> /dev/null)" != "" ]]); then + echo "Docker container for TENET is built and tagged as tenet:base" +else + echo "Oops! Unable to build Docker container for TENET" +fi + + +cd $BASEDIR/Algorithms/NGSEM/ +docker build --no-cache -t ngsem:base . +if ([ $? = 0 ] && [[ "$(docker images -q ngsem:base 2> /dev/null)" != "" ]]); then + echo "Docker container for NGSEM is built and tagged as ngsem:base" +elif [ "$(docker images -q ngsem:base 2> /dev/null)" != "" ]; then + echo "Docker container failed to build, but an existing image exists at ngsem:base" + echo "Oops! Unable to build Docker container for ngsem" +fi + + cd $BASEDIR diff --git a/sfaira b/sfaira new file mode 160000 index 00000000..55bcbbf2 --- /dev/null +++ b/sfaira @@ -0,0 +1 @@ +Subproject commit 55bcbbf2588a08e4a98b1e65324119407ec6e75c