Tpot_ec_prediction

EC number prediction models created using the TPOT tool. The TPOT tool can be accessed here. This tool uses Genetic Programming in order to arrive at optimized machine learning pipelines, which were validated and used to created the following models. These models were created hierarchically and the pipelines chosen are adapted for each EC number digit (with enough data to study). These models were done for the Master Dissertation "A Study of Machine Learning for Artificial Intelligence-Based Enzyme Classification.", of the Computational Biology and Bioinformatics Master from Lisbon's Nova University, at NOVA ITQB. For using the models, you need to have Python and Anaconda and follow the next steps if you are on a terminal:

Clone the repository

git clone https://github.com/Ananas-bio/Tpot_ec_prediction.git

Create and activate a conda environment using the YAML file

conda env create -f environment.yml
conda activate tpot_ec

To run the models in the terminal window here is an example:

python ec_predict -i uniprot_test.fasta -l 3 -m c40

The -l and -m are optional, with the default of -l being 3 (as in prediction up to level 3) and -m the c40 model (can be c40 or swiss).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
EC_number_prediction		EC_number_prediction
.gitattributes		.gitattributes
README.md		README.md
ec_predict.py		ec_predict.py
environment.yml		environment.yml
predicitions.tsv		predicitions.tsv
uniprot_test.fasta		uniprot_test.fasta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tpot_ec_prediction

About

Releases

Packages

Languages

Ananas-bio/Tpot_ec_prediction

Folders and files

Latest commit

History

Repository files navigation

Tpot_ec_prediction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages