This script is designed to generate question-and-answer (QA) datasets for Antimicrobial Peptides (AMP), based on either interactions or function data downloaded from UniProt. It is capable of handling large datasets and utilizes powerful language models for generating function-based questions.
- Protein Interaction QA: Generate questions from a dataset describing protein interactions.
- Protein Function QA: Utilize a pretrained model to generate function-based questions.
- Customizable Input/Output: Specify input files and output directories through command-line arguments.
Ensure you have the following installed:
- Python 3.7 or higher
- torch
- transformers
This script is tested on Linux and Windows operating systems.
-
Clone the repository (if the script is hosted on a Git repository):
git clone https://your-repository-link.git cd your-repository-directory
-
Set up a Python virtual environment (recommended):
python -m venv venv source venv/bin/activate # Use `venv\Scripts\activate` on Windows
-
Install required packages:
pip install torch transformers
To use the script, you will need to provide the type of questions, the path to the input JSON file, and the output directory where the results will be stored.
-t
,--type
: Specify the type of questions (i
for interactions,f
for functions)-i
,--input
: Path to the input JSON file containing protein data-o
,--output
: Path to the directory where the results will be saved
-
Generating Interaction Questions:
python script_name.py -t i -i your/path/to/protein_interactions.json -o your/path/to/output
-
Generating Function Questions:
python script_name.py -t f -i your/path/to/protein_functions.json -o your/path/to/output
The output will be JSON files containing the QA datasets:
interaction_questions.json
for interaction-based questions.function_questions.json
for function-based questions.
These files will be saved in the specified output directory.