SigPhi-Med is a lightweight vision-language model designed for biomedical applications. It leverages compact architectures while maintaining strong performance in visual question answering (VQA) and related multimodal tasks. This repository provides code for training, evaluation, and model deployment.
To set up the environment, refer to TinyLLaVA Factory and requirements.txt
file.
SigPhi-Med is trained and evaluated on the following biomedical multimodal datasets:
To train SigPhi-Med, modify the training script as needed:
- Edit the configuration in
scripts/train/train_phi.sh
. - Run the training script:
sh scripts/train/train_phi.sh
To evaluate the model on biomedical VQA tasks, use:
sh scripts/eval/VQA.sh
We appreciate the contributions of the following projects: