Skip to content

Latest commit

 

History

History
53 lines (40 loc) · 2.92 KB

File metadata and controls

53 lines (40 loc) · 2.92 KB

Object Detection and Scene Description in a Supermarket

This is a course project for the postgraduate level course of Computer Vision and Cognitive System taught at DIEF, UniMoRe.

Datasets

  • For the Object Detection task, we use the SKU110K dataset.
  • For the Product Classification and Embeddings for the Product Retrieval task, we use the GroceryStoreDataset.

Training and Experimentations

For training the Faster RCNN model for Object detection:

sbatch frcnn.slurm

For training the DenseNet 121 model for Product Classification and Embeddings for the Product Retrieval:

sbatch clf.slurm

Implementation and Inference

Object Detection and Scene Description

  • For the implementation of the complete pipeline:
    • Classical Scene Image Preprocessing (Histogram Equalization)
    • Inference of both models: Faster RCNN and DenseNet 121 (commented out)
    • Shelf numbering: K Means with Silhouette Analysis
    • Dominant colour recognition (commented out)
    • Zero-Shot Product Detection using CLIP (Contrastive Language-Image Pre-training) model
    • Spatial Description through geometrical templating
    • Concise Scene Description using ChatGPT 3.5 Turbo through OpenAI API
export OPENAI_API_KEY=entergeneratedAPIKey

sbatch inference.slurm

pipeline

Retrieval Mechanism

Retrieval was initially experimented using Google Colab: https://colab.research.google.com/drive/1HXn3XRod3_6CHOes7aB0bJltz-IJagRP?usp=sharing

sbatch retrival.slurm

retrival

(Additional modifications can be made by editing the Python scripts mentioned in the corresponding slurm files.)