This project implements a machine learning pipeline for generating police sketches from textual descriptions using fine-tuned Stable Diffusion and CLIP models.
# Create virtual environment
python -m venv .venv
# Activate virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
- Download the CUHK Face Sketch FERET Dataset:
python download_data.py
This will download the dataset to the data/
directory.
- Generate text descriptions using OpenAI's API:
cd chatgpt_descriptions
python gen_descriptions.py
This script uses OpenAI's GPT model to generate detailed descriptions for each sketch. The descriptions are saved in data/descriptions/
.
Note: These steps have already been completed, and the data is included in the repository.
cd ..
python sd_fine_tune.py
This script performs an ablation study on different LoRA configurations:
- Self-attention only (
ablation_study_self_only/
) - Cross-attention only (
ablation_study_cross_only/
) - Both attention types (
ablation_study_both/
)
Key Finding: The study concluded that applying LoRA to both self- and cross-attention layers produces the most sketch-like and consistent results. This configuration was used for the final model.
Note: This study has been completed, and results are available in the respective directories.
python clip_fine_tune.py
Fine-tunes the CLIP model for better text-image alignment specific to police sketches. The final checkpoint is available at clip_checkpoint_epoch_20.pt
.
- Test base Stable Diffusion:
python basic_sd_test.py
Generates sample images using the base model for comparison.
- Test integrated pipeline:
python clip_sd_pipeline.py
Tests the combination of fine-tuned CLIP and Stable Diffusion models.
Run the test suite in the tests/
directory:
Each test is provided as a standalone Jupyter Notebook. To run a test, simply execute the cells in the corresponding notebook:
img2clip_finetuned.ipynb
: Executes the fine-tuned CLIP-to-img2img Stable Diffusion pipeline on a single image.img2imgclip.ipynb
: Executes the CLIP-to-img2img Stable Diffusion pipeline on a single image.img2imgtest.ipynb
: Executes the baseline img2img Stable Diffusion pipeline on a single image.iteration_img2clip.ipynb
: Runs the CLIP-to-img2img Stable Diffusion pipeline with iterative steps and plots metrics (SSIM, PSNR, CLIP Score, LPIPS) across iterations.iteration_stable.ipynb
: Runs the baseline img2img Stable Diffusion pipeline with iterative steps and plots metrics across iterations.iteration_finetuned_img2clip.ipynb
: Runs the fine-tuned CLIP-to-img2img Stable Diffusion pipeline with iterative steps and plots metrics across iterations.final_metrics
: Runs all iterative tests and computes performance comparisons across models
The following files were created during initial development but were not used in the final implementation:
app/
directory: Contains a Flask web application that was initially planned for deploymentrun.py
: Web application entry pointtables.py
: Database schema definitions
test_self_attention.py
: Standalone test for self-attention mechanism
.
├── app/ # Unused web application
├── data/ # Dataset and descriptions
│ ├── sketches/ # CUHK Face Sketch FERET Dataset
│ └── descriptions/ # Generated text descriptions
├── tests/ # Test files
├── ablation_study_*/ # Ablation study results
├── generated_sketches/ # Model outputs
└── requirements.txt # Project dependencies
- Ablation study results:
ablation_study_*/
- Generated samples:
generated_sketches/
- Training metrics:
training_plots/
- CLIP checkpoint:
clip_checkpoint_epoch_20.pt
See requirements.txt
for complete list of dependencies. Key requirements:
- PyTorch
- Diffusers
- Transformers
- OpenAI API (for description generation)
- NumPy
- Matplotlib