generated from frapercan/python-poetry-template
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #20 from CBBIO/fantasia
Fantasia
- Loading branch information
Showing
12 changed files
with
199 additions
and
121 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8 changes: 5 additions & 3 deletions
8
...eration/embedding/sequence_embeddings.rst → ...peration/embedding/sequence_embedding.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,8 @@ | ||
Embedding Modulefds | ||
================ | ||
Sequence Embeddings Module | ||
========================== | ||
|
||
|
||
.. automodule:: protein_metamorphisms_is.operation.embedding.sequence_embedding | ||
:members: | ||
:show-inheritance: | ||
:show-inheritance: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,84 @@ | ||
Gracias por la observación. He actualizado el README con la cita faltante: | ||
|
||
--- | ||
|
||
# FANTASIA: Functional Annotation System and Task Orchestration | ||
|
||
![FANTASIA Logo](img/FANTASIA_logo.png) | ||
|
||
# FANTASIA: Functional Annotation System and Task Orchestration | ||
FANTASIA (Functional ANnoTAtion based on embedding space SImilArity) is a pipeline for annotating Gene Ontology (GO) terms for protein sequences using advanced protein language models like **ProtT5**, **ProstT5**, and **ESM2**. This system automates complex workflows, from sequence processing to functional annotation, providing a scalable and efficient solution for protein structure and functionality analysis. | ||
|
||
--- | ||
|
||
## Key Features | ||
|
||
- **Redundancy Filtering**: Removes identical sequences with **CD-HIT** and optionally excludes sequences based on length constraints. | ||
- **Embedding Generation**: Utilizes state-of-the-art models for protein sequence embeddings. | ||
- **GO Term Lookup**: Matches embeddings with a vector database to retrieve associated GO terms. | ||
- **Results**: Outputs annotations in timestamped CSV files for reproducibility. | ||
|
||
--- | ||
|
||
## Installation | ||
|
||
To install FANTASIA, ensure you have Python 3.8+ installed and use the following commands: | ||
|
||
```bash | ||
pip install fantasia-pipeline | ||
``` | ||
|
||
For more details, visit the [PyPI page](https://protein-metamorphisms-is.readthedocs.io/en/latest/pipelines/fantasia.html). | ||
|
||
--- | ||
|
||
## Quick Start | ||
|
||
### Prerequisites | ||
|
||
Ensure the **Information System** is properly configured before running FANTASIA. Detailed instructions are available in the [project documentation](../../README.md). | ||
|
||
### Running the Pipeline | ||
|
||
Execute the following command, specifying the path to the configuration file: | ||
|
||
```bash | ||
python main.py --config <path_to_config.yaml> | ||
``` | ||
|
||
### Pipeline Overview | ||
|
||
1. **Redundancy Filtering**: Removes identical sequences and optionally filters sequences based on length. | ||
2. **Embedding Generation**: Computes embeddings for sequences using supported models and stores them in HDF5 format. | ||
3. **GO Term Lookup**: Queries a vector database to find and annotate similar proteins. | ||
4. **Output**: Saves annotations in a structured CSV file. | ||
|
||
FANTASIA es un sistema diseñado para la anotación funcional y la orquestación de tareas a gran escala, proporcionando una solución eficiente y automatizada para flujos de trabajo complejos en análisis estructural y funcional de proteínas. | ||
--- | ||
|
||
## Documentation | ||
|
||
For complete details on pipeline configuration, parameters, and deployment, visit the [FANTASIA Documentation](https://protein-metamorphisms-is.readthedocs.io/en/latest/pipelines/fantasia.html). | ||
|
||
--- | ||
|
||
## Requisitos previos | ||
## Citation | ||
|
||
If you use FANTASIA in your work, please cite the following: | ||
|
||
Para ejecutar el pipeline de FANTASIA, es necesario que el **Sistema de Información** esté correctamente desplegado y configurado. Este sistema proporciona las bases de datos y servicios necesarios para la ejecución de tareas, el almacenamiento de resultados y la gestión coherente de las operaciones del pipeline. | ||
1. Martínez-Redondo, G. I., Barrios, I., Vázquez-Valls, M., Rojas, A. M., & Fernández, R. (2024). Illuminating the functional landscape of the dark proteome across the Animal Tree of Life. | ||
https://doi.org/10.1101/2024.02.28.582465. | ||
|
||
2. Barrios-Núñez, I., Martínez-Redondo, G. I., Medina-Burgos, P., Cases, I., Fernández, R. & Rojas, A.M. (2024). Decoding proteome functional information in model organisms using protein language models. | ||
https://doi.org/10.1101/2024.02.14.580341. | ||
|
||
--- | ||
|
||
## Próximos pasos | ||
## Contact Information | ||
|
||
- Francisco Miguel Pérez Canales: [email protected] | ||
- Gemma I. Martínez-Redondo: [email protected] | ||
- Ana M. Rojas: [email protected] | ||
- Rosa Fernández: [email protected] | ||
|
||
--- | ||
|
||
1. **Despliegue del Sistema de Información**: Asegúrese de seguir la [documentación principal](../../README.md) para la configuración previa del sistema de información y los servicios asociados. | ||
2. **Ejecución del pipeline**: Una vez desplegado el sistema de información, puede ejecutar el pipeline siguiendo las instrucciones detalladas a continuación. | ||
Si necesitas más ajustes, ¡hazmelo saber! |
Oops, something went wrong.