We present the implementation of the MambaVision model within the Single Shot MultiBox Detector (SSD) framework for enhanced object detection performance. MambaVision leverages advanced feature extraction techniques and attention mechanisms, enabling improved representation of complex visual patterns. We integrate MambaVision into the SSD architecture, optimizing the detection pipeline for real-time applications while maintaining high accuracy across various datasets. Our experimental results demonstrate a significant increase in mAP metrics, showcasing the efficacy of combining MambaVision's robust feature extraction capabilities with SSD's rapid detection framework.
The following is an example based on PyTorch 2.4.1 with CUDA 12.4. For other versions, please refer to the official website of PyTorch
# create environment
conda env create -f environment.yml
# activate environment
conda activate ssd
COCO, VOC2007, VOC2012, CityScapes and FoggyCityScapes datasets are available.
Specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/COCO2014.sh
Specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2007.sh # <directory>
Specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2012.sh # <directory>
- Download Cityscapes and Foggy Cityscapes dataset from the link. Particularly, we use leftImg8bit_trainvaltest.zip for Cityscapes and leftImg8bit_trainvaltest_foggy.zip for Foggy Cityscapes.
- Unzip them under the directory like
data/cityscapes
├── gtFine
├── leftImg8bit
├── leftImg8bit_foggy
└── ...
Then run
python utils/prepare_cityscapes_to_voc.py
This will automatically generate dataset in VOC
format.
data/cityscapes_in_voc
├── Annotations
├── ImageSets
└── JPEGImages
data/foggy_cityscapes_in_voc
├── Annotations
├── ImageSets
└── JPEGImages
CUDA_VISIBLE_DEVICES=$GPU_ID \
python train.py \
--dataset VOC
--dataset_root <dataset_root> \
--end_epoch 100 \
--lr 5e-3
My implementation borrows many parts from ssd.pytorch, MambaVision, and I3Net