Skip to content

devyn-miller/the-final-assignment2-cpsc542

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

runme
id version
01HSEEWGP8NWV3ZH52D49ZNGHM
v3

CPSC542 Assignment 2

Student Information

  • Student Name: Devyn Miller
  • Student ID: 2409539

Collaboration

  • In-class collaborators: Hayden Fargo (not working on the same code, but just bouncing ideas off of one another)

Resources

Data Source

  • Data source: oxford_iiit_pet dataset, info = tfds.load('oxford_iiit_pet:3.*.*', with_info=True)

Code Repository

Project Organization and Pipeline Overview

This project develops an image segmentation pipeline that leverages deep learning and computer vision techniques using images from the Oxford-IIIT Pet Dataset, focusing on accurately distinguishing pets from their backgrounds across various settings and poses. I used TensorFlow and Keras libraries. Through careful organization and documentation, I aimed to create a transparent and reproducible workflow that addresses the challenges of pet image segmentation.

Project Structure

  • README.md: Project overview, setup instructions, and additional notes.
  • src: Source code for the project.
    • preprocessing.py: Contains functions for data loading and preprocessing.
    • augmentation.py: Implements data augmentation techniques.
    • model.py: Defines the U-Net model architecture.
    • training.py: Manages the model training process.
    • metrics.py: Evaluation metrics and performance analysis.
    • grad_cam.py: Grad-CAM visualizations for model interpretation.
  • figures: Directory for storing static figures and plots.
    • Various PNG images from initial exploratory data analysis.
    • model_structures: PNG Images for various model architectures I experimented with.
  • main.ipynb: Main notebook with project walkthrough, including EDA and results.
  • eda.ipynb: Additional EDA not displayed in main.ipynb
  • model_history.json: Stores the training history of the model for analysis.
  • models: Due to size constraints, the trained model is hosted externally.

Data Preprocessing

I started by preprocessing the data to make it suitable for training a deep learning model. This involved loading the dataset using TensorFlow Datasets (TFDS), normalizing the pixel values of the images to the range [0, 1], and resizing both the images and their corresponding segmentation masks to a uniform dimension of 128x128 pixels. The preprocessing steps are encapsulated in the preprocess function within src/preprocessing.py.

Data Augmentation

To improve the model's generalization capability, I implemented data augmentation techniques, including random horizontal flipping of the images and masks. This augmentation is performed on-the-fly during training to introduce variability in the training data without increasing its size. The augmentation logic is defined in the augment function in src/augmentation.py.

Model Architecture

For the segmentation task, I utilized a U-Net architecture, known for its effectiveness in image segmentation tasks. The U-Net model comprises a pretrained MobileNetV2 as the encoder and a series of upsample blocks as the decoder, facilitating precise localization. The model architecture is defined in src/model.py.

Training

The model is compiled and trained using the Adam optimizer and Sparse Categorical Crossentropy loss function, suitable for multi-class segmentation tasks. Training involves feeding the preprocessed and augmented images to the model, with early stopping implemented to prevent overfitting. The training process is managed by the train_model function in src/training.py.

Evaluation and Visualization

Post-training, the model's performance is evaluated using metrics such as accuracy, precision, recall, F1 score, and Intersection over Union (IoU). Additionally, I implemented Grad-CAM visualizations to interpret the model's predictions, highlighting the regions of the image that contributed most to the segmentation decision. The evaluation and visualization steps are detailed in src/metrics.py and src/grad_cam.py.

Conclusion

This project demonstrates a structured approach to solving an image segmentation problem using deep learning. By carefully preprocessing the data, augmenting it to enhance model robustness, and employing a powerful U-Net architecture, I was able to achieve precise segmentation of pets from their backgrounds. The project's modular design ensures each component is easily understandable and modifiable for future enhancements or adaptations to similar tasks.

Additional notes below — please read.

  1. The u-net model was too large to push to github and can be found here. Alt text
  2. All of the plots and figures (EDA, grad-CAM, etc) for this project can be found in the Jupyter notebook entitled main.ipynb. The report only contains the metrics table and graphs to show model performance.
  3. As described in my writeup, I modularized things from an initial Jupyter notebook that contained everything. The png images saved in the /figures folder are from that inital ipynb, while those in main.ipynb were generated when running the modularized project.
  4. I forgot to include part of the EDA in my pipeline, but it is rather simple and can be found in the Jupyter notebook entitled eda.ipynb

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published