Pull Request for NRNB issue 238: Using Foundation model for GRN Inference #127
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes made and files added:
BLEval folder:
1. init.py file:
InputSettings class
use_embeddings
parameter to__init__
method.adjust_paths_for_embeddings
method to modify file paths when using embeddings.get_true_edges_path
method for retrieving the correct path to true edges file.OutputSettings class
datasets
anduse_embeddings
parameters to__init__
method.adjust_paths_for_embeddings
method to modify output paths when using embeddings.get_output_path
method for constructing output file paths.BLEval class
computeAUC
method to support embeddings and use the newget_true_edges_path
method.computeEarlyPrec
method to support the embeddings generation by passingself.input_settings
into theEarlyPrec()
function as well.ConfigParser class
use_embeddings
parameter toparse
method__parse_input_settings
and__parse_output_settings
methods to support embeddingsOther improvements
from pathlib import Path
consistently.Path
objects for better cross-platform compatibility.2. computeDGAUC file:
In PRROC function
3. computeEarlyPrec file:
In EarlyPrec() function
Unchanged elements
Blevaluator.py file:
In config get_parser() function:
In main() function:
Post-Run Evaluation Initialization: Initialization of the evaluation summarizer (evalSummarizer) using the parsed input and output settings. This replaces any hardcoded settings with dynamic configuration, allowing for greater flexibility and adaptability.
Output Directory Construction:
The output directory is now dynamically constructed based on the evaluation settings (evalSummarizer.output_settings.base_dir, evalSummarizer.input_settings.datadir, and evalSummarizer.output_settings.output_prefix). This change ensures that the output files are stored in a well-organized directory structure that reflects the evaluation parameters.
Unchanged Logic:
The core functionality for computing and saving the AUPRC and AUROC values remains unchanged. The logic has been preserved but now exists within a more structured and configurable context.
BLRun Folder:
1. generate_embeds.py:
2. runner.py:
In generate_embeds method:
Subprocess Execution: The script is executed via a subprocess command, with robust error handling to ensure that any issues are clearly reported.
- The generateInputs() method has been updated to call generate_embeddings() when self.use_embeddings is True. This ensures that the embeddings are generated before any other input processing steps are performed.
- The run() method has been updated to ensure that the output directory is created based on the processed input directory, maintaining a clear separation of inputs and outputs.
BLRunner.py file:
The get_parser() function now includes a new command-line argument --use_embeddings, which allows users to specify whether embeddings should be used in the pipeline. This argument is a boolean flag (action='store_true'), meaning that it will default to False unless explicitly provided by the user.
The parse_arguments() function has been updated to return the parsed arguments, including the newly added use_embeddings flag. This function now serves as the entry point for all command-line configurations, ensuring that the user's preferences are respected throughout the pipeline.
Model folder:
generate_embeddings.py:
Dockerfile and req.yaml added to support a containerised environment for the embeddings generation pipeline.
Initialize.sh
ReadMe file
CORE FLOW:
The pipeline, if run with the command
will
inputs/example/GSD/ExpressionData.csv
, Embeddings data will be generated and saved atinputs/example/GSD/processedExpressionData/EmbeddingsData.csv
and the refNetwork.csv file will automatically be copied and saved in the processedExpressionData directory.AND, if the command used is:
Both auc and epr pipelines have been modified to be used along with the use_embeddings flag
The Evaluation pipeline will run on the GRNs we Inferred using EmbeddingsData.csv
inputs/example/GSD/processedExpressionData
andoutputs/example/GSD/processedExpressionData
respectfully.