Skip to content

BioinformaticsToolsmith/EnhancerDetector

Repository files navigation

EnhancerDetector

Copyright (C) 2025 Luis M. Solis, Geyenna Sterling-Lentsch, Mark S. Halfon, and Hani Z. Girgis

Academic use: Affero General Public License version 1.

Any restrictions to use for profit or non-academics: Alternative commercial license is required.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Please contact Dr. Hani Z. Girgis ([email protected]) if you need more information.

EnhancerDetector is a deep learning-based classification tool for predicting the presence of enhancers in DNA sequences. It uses species-specific convolutional neural networks (CNNs) trained on experimentally validated datasets for human, mouse, and Drosophila melanogaster (fly). Class activation mapping (CAM) can optionally be used to visualize the regions in each sequence that most influenced the model’s decision. Also included is a finetune pipeline to specialize the human model to another species. Please read original research paper for more information.

DOI

10.5281/zenodo.15531293

Files:

Models: Contains trained CNN models for each supported species (human, mouse, and fly) used by EnhancerDetector. The fly model uses an ensemble of three classifiers. The folder also includes indexers for converting DNA to numerical format.

Output: This folder stores the outputs generated by EnhancerDetector, including enhancer predictions and class activation maps.

Test_Input: This folder contains test input files for EnhancerDetector. These files demonstrate the required input format and can be used to test if the tool is functioning correctly. input_human.fasta contains a list of hg38 human sequences in FASTA format. input_mouse.fasta contains a list of mm10 mouse sequences in FASTA format. input_fly.fasta contains a list of dm6 melanogaster sequences in FASTA format. Note that both mouse and human sequences are 400 base pair long while the fly is 500 base pair long. The models only accept these sequence lengths. Also inside is the folder for the fine tuning test inputs, a control and enhancer fasta file of 400 bp length from the mm10 mouse.

EnhancerDetector.ipynb: This Jupyter notebook contains the code to run EnhancerDetector. If you are comfortable using Jupyter notebooks, you can modify the parameters in the first few cells and execute the notebook to generate an output evaluation from EnhancerDetector.

Finetune_Network.py: This python code is executed via terminal and will finetune the human network to a specific species. It takes a input enhancer fasta file of 400 bp length and a control fasta file of 400 bp length. It will output a new finetuned model that can be used with EnhancerDetector.

FineTune_Output: This folder contains the test outputs from the Finetune_Network, when the network is finetuned it will be saved there.

Tool:

EnhancerDetector.py: This python code contains the code to run EnhancerDetector. This code is to be executed via terminal, it takes a input fasta file along with a input for whether the user wants class activation maps generated, which species model to use and a output directory. Also included is the ability to use a custom_model that was generated from FineTune_Network.

Requirements:

EnhancerDetector uses several libraries:

Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]

TensorFlow version: 2.13.0

Biopython version: 1.83

NumPy version: 1.24.3

Matplotlib version: 3.8.3

The Finetuning uses the above libraries including:

Scikit-learn version: 1.3.0

NOTE: These are the versions the programs were created on, future versions of these libraries may or may not work.

Parameters:

EnhancerDetector uses five parameters:

--species: choose between human, mouse and fly. This will determine which model to use for which species you want to evaluate.

--input: the input directory of the fasta file

--cam: this will determine if Class Activation Maps are generated for the given sequences.

--outdir: the output dirctory that the output of both the evaluation and cams will be placed in.

--custom_model: the path to the custom finetuned model that is generated by the finetune program. This parameter will override the species parameter by using only the custom model and the human indexer along with being only 400 bp sequences.

Finetune_Network uses five parameters:

--enhancers: the input directory to the enhancer fasta file.

--controls: the input directory to the control fasta file.

--output_dir: the output directory that the saved model will be saved.

--batch_size: The size of the training batches that will be used for training, adjust based on your machine to avoid GPU or memory overflow. Default is 512.

--use_shuffle: This will shuffle your given enhancers and include them in the controls, recommended as it can help with training.

To Run Tool:

  1. Clone EnhancerDetector and head to where it was cloned.

  2. Make sure EnhancerDetector is unzipped.

  3. Run EnhancerDetector.py

    python EnhancerDetector.py --species human --input human_sequences.fa --outdir Output

  4. Results will be saved in the Output/ folder as Model_Output.txt, which lists each input sequence and its predicted enhancer probability.

  5. If you want a CAM model generated for the input sequences run:

    python EnhancerDetector.py --species human --input human_sequences.fa --cam --outdir Output

  6. In the output folder, the generated CAM is called sequence_CAM.pdf

    Opening it will show a heatmap for the given sequence, the dark red regions show the main area that influenced EnhancerDetector's final decision. Please read the main paper for more details.

If you wish to run EnhancerDetector via the jupyter notebook:

  1. Open EnhancerDetector.ipynb

  2. Locate the third cell then locate and edit the following parameters:

    similar_sequences_file = directory_of_input/sequences.fa

    network = f'{model_folder}/{Species_Folder}/model.keras'

    indexer_dir = f'{model_folder}/{Species_Folder}/indexer.pkl'

  3. If you want a CAM model generated then locate the second cell and locate output_cam_pdf.

    Set this parameter to output_cam_pdf = True

  4. If you want to change the output directory locate and edit output_dir with your output directory.

  5. For the Fly Model in the third cell you will see the use_fly parameter, just set that to True and it will automatically use those models.

  6. Once you edit the parameters, run the entire notebook and the outputs will be generated in the output directory.

To Run our Tests:

  1. Look inside the Test_Input folder, inside are three fasta files:

    input_human.fasta = Includes ten human sequences in fasta format, the first five are likely enhancers while the last five are non-enhancers.

    input_mouse.fasta = Includes ten mouse sequences in fasta format, the first five are likely enhancers while the last five are non-enhancers.

    input_fly.fasta = Includes ten fly sequences in fasta format, the first five are likely enhancers while the last five are non-enhancers.

  2. In the main directory run EnhancerDetector.py

    For Human

    python EnhancerDetector.py --species human --input Test_Input/input_human.fa --outdir Output

    For Mouse

    python EnhancerDetector.py --species mouse --input Test_Input/input_mouse.fa --outdir Output

    For Fly

    python EnhancerDetector.py --species fly --input Test_Input/input_fly.fa --outdir Output

    If you want to generate a CAM output for each sequence run:

    For Human

    python EnhancerDetector.py --species human --input Test_Input/input_human.fa --cam --outdir Output

    For Mouse

    python EnhancerDetector.py --species mouse --input Test_Input/input_mouse.fa --cam --outdir Output

    For Fly

    python EnhancerDetector.py --species fly --input Test_Input/input_fly.fa --cam --outdir Output

  3. Inside the Output folder will be the results and CAM models for each given sequence.

  4. If you want to use the jupyter notebook then open EnhancerDetector.ipynb

  5. By default the similar_sequences_file should already be set to the human test cases. Change the parameters to the mouse/fly and switch the use_fly to True if using the fly.

  6. Set output_cam_pdf to true if you want to generate the CAM models.

  7. Run the entire notebook and the outputs will be generated in the Output folder.

To Run the Finetuning:

When running the finetuning its recommended to use 20,000 enhancer sequences from the species and 40,000 control sequences from the species genome. Best to keep a 2:1 ratio of controls to enhancers, also all sequences must be 400 bp in length. This program will finetune the human network to these new enhancers and output the saved finetuned network.

  1. Clone EnhancerDetector and head to where it was cloned.

  2. Make sure EnhancerDetector is unzipped.

  3. Run Finetune_Network.py

    python Finetune_Network.py --enhancers enhancers.fa --controls controls.fa --output_dir Test/

  4. If you want to use the shuffled enhancers in your control then run:

    python Finetune_Network.py --enhancers enhancers.fa --controls controls.fa --output_dir Test/ --use_shuffle

  5. The saved finetuned network will be saved in the given output directory.

To Run our Finetuning Test:

  1. Look inside the Test_Input folder then inside the Fine_Tune_Inputs folder, inside are two fasta files:

    test_enhancers.fasta = Includes 20,000 mouse enhancer sequences of 400 bp length in fasta format.

    test_controls.fasta = Includes 40,000 mouse control sequences of 400 bp length in fasta format.

  2. In the main directory run Finetune_Network.py:

    python Finetune_Network.py --enhancers Test_Input/Fine_Tune_Inputs/test_enhancers.fasta --controls Test_Input/Fine_Tune_Inputs/test_controls.fasta --output_dir FineTune_Output/

    If you want to shuffle the enhancers:

    python Finetune_Network.py --enhancers Test_Input/Fine_Tune_Inputs/test_enhancers.fasta --controls Test_Input/Fine_Tune_Inputs/test_controls.fasta --output_dir FineTune_Output/ --use_shuffle

  3. Let it train and the saved model will be saved in the output directory. The saved model will be named model_finetuned.keras.

To use the Finetuned model on EnhancerDetector:

  1. Run EnhancerDectector.py with the --custom_model parameter:

    python EnhancerDetector.py --species mouse --input Test_Input/input_mouse.fa --outdir Output --custom_model FineTune_Output/model_finetuned.keras

    The program will run as normal, note that the custom_model parameter will override the --species parameter so you can put whatever there. The given test finetuned model was finetuned on the mouse mm10 dataset so we give it the input_mouse.fa but you can replace this with your finetuned species.

    If you want a CAM of the given sequences then add the --cam parameter:

    python EnhancerDetector.py --species mouse --input Test_Input/input_mouse.fa --cam --outdir Output --custom_model FineTune_Output/model_finetuned.keras

  2. The output classification and cams will be saved in the given output directory.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published