Skip to content

Official code of RetSeg3D: Retention-based 3D Semantic Segmentation for Autonomous Driving paper.

License

Notifications You must be signed in to change notification settings

gopi-erabati/RetSeg3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RetSeg3D: Retention-based 3D Semantic Segmentation for Autonomous Driving

This is the official PyTorch implementation of the paper RetSeg3D: Retention-based 3D Semantic Segmentation for Autonomous Driving, by Gopi Krishna Erabati and Helder Araujo.

G. K. Erabati and H. Araujo, "RetSeg3D: Retention-based 3D Semantic Segmentation for Autonomous Driving," in Computer Vision and Image Understanding (CVIU), 2024. https://doi.org/10.1016/j.cviu.2024.104231

Contents

  1. Overview
  2. Results
  3. Requirements, Installation and Usage
    1. Prerequistes
    2. Installation
    3. Training
    4. Testing
  4. Acknowledgements
  5. Reference

Overview

LiDAR semantic segmentation is one of the crucial tasks for scene understanding in autonomous driving. Recent trends suggest that voxel- or fusion-based methods obtain improved performance. However, the fusion-based methods are computationally expensive. On the other hand, the voxel-based methods uniformly employ local operators (e.g., 3D SparseConv) without considering the varying-density property of LiDAR point clouds, which result in inferior performance, specifically on far away sparse points due to limited receptive field. To tackle this issue, we propose novel retention block to capture long range dependencies and maintain the receptive field of far away sparse points and design RetSeg3D, a retention-based 3D semantic segmentation model for autonomous driving. Instead of vanilla attention mechanism to model long range dependencies, inspired by RetNet, we design cubic window multi-scale retentive self-attention (CW-MSRetSA) module with bidirectional and 3D explicit decay mechanism to introduce 3D spatial distance related prior information into the model to improve not only the receptive field but also the model capacity. Our novel retention block maintains the receptive field which significantly improve the performance of far away sparse points. We conduct extensive experiments and analysis on three large-scale datasets: SemanticKITTI, nuScenes and Waymo. Our method not only outperform existing methods on far away sparse points but also on close and medium distance points and efficiently runs in real time at 52.1 FPS.

RetSeg3D_arch

Results

Predictions on Waymo dataset

1724336343405-ezgif com-optimize

Predictions on SemanticKITTI, Waymo and nuScenes datasets

RetSeg3D_Retention-based3DSemanticSegmentationforAutonomousDriving-ezgif com-video-to-gif-converter

Quantiative Results (mIoU)

RetSeg3D SemanticKITTI nuScenes Waymo
mIoU 70.3 76.9 70.1
Config retseg3d_semantickitti.py retseg3d_nus.py retseg3d_waymo.py
Model weights weights

We can not distribute the model weights on Waymo dataset due to the Waymo license terms.

Requirements, Installation and Usage

Prerequisites

The code is tested on the following configuration:

Installation

mkvirtualenv retseg3d

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -U openmim
mim install mmcv==2.1.0
mim install mmengine==0.10.1

pip install -r requirements.txt

Data

Follow MMDetection3D-1.3.0 to prepare the SemanticKITTI and nuScenes datasets. Follow Pointcept for Waymo data prepreocessing and then run python tools/create_waymo_semantic_info.py /path/to/waymo/preprocess/dir to generate the .pkl files required for the config.

Warning: Please strictly follow MMDetection3D-1.3.0 code to prepare the data because other versions of MMDetection3D have different coordinate refactoring.

Clone the repository

git clone https://github.com/gopi-erabati/RetSeg3D.git
cd RetSeg3D

Training

SemanticKITTI dataset

  • Single GPU training
    1. Add the present working directory to PYTHONPATH export PYTHONPATH=$(pwd):$PYTHONPATH
    2. python tools/train.py configs/retseg3d_semantickitti.py --work-dir {WORK_DIR}.
  • Multi GPU training tools/dist_train.sh configs/retseg3d_semantickitti.py {GPU_NUM} --work-dir {WORK_DIR}.

nuScenes dataset

  • Single GPU training
    1. Add the present working directory to PYTHONPATH export PYTHONPATH=$(pwd):$PYTHONPATH
    2. python tools/train.py configs/retseg3d_nus.py --work-dir {WORK_DIR}.
  • Multi GPU training tools/dist_train.sh configs/retseg3d_nus.py {GPU_NUM} --work-dir {WORK_DIR}.

Waymo dataset

  • Single GPU training
    1. Add the present working directory to PYTHONPATH export PYTHONPATH=$(pwd):$PYTHONPATH
    2. python tools/train.py configs/retseg3d_waymo.py --work-dir {WORK_DIR}
  • Multi GPU training tools/dist_train.sh configs/retseg3d_waymo.py {GPU_NUM} --work-dir {WORK_DIR}

Testing

SemanticKITTI dataset

  • Single GPU testing
    1. Add the present working directory to PYTHONPATH export PYTHONPATH=$(pwd):$PYTHONPATH
    2. python tools/test.py configs/retseg3d_semantickitti.py /path/to/ckpt --work-dir {WORK_DIR}
  • Multi GPU testing ./tools/dist_test.sh configs/retseg3d_semantickitti.py /path/to/ckpt {GPU_NUM} --work-dir {WORK_DIR}.

nuScenes dataset

  • Single GPU testing
    1. Add the present working directory to PYTHONPATH export PYTHONPATH=$(pwd):$PYTHONPATH
    2. python tools/test.py configs/retseg3d_nus.py /path/to/ckpt --work-dir {WORK_DIR}.
  • Multi GPU testing ./tools/dist_test.sh configs/retseg3d_nus.py /path/to/ckpt {GPU_NUM} --work-dir {WORK_DIR}.

Waymo dataset

  • Single GPU testing
    1. Add the present working directory to PYTHONPATH export PYTHONPATH=$(pwd):$PYTHONPATH
    2. python tools/test.py configs/retseg3d_waymo.py /path/to/ckpt --work-dir {WORK_DIR}.
  • Multi GPU testing ./tools/dist_test.sh configs/retseg3d_waymo.py /path/to/ckpt {GPU_NUM} --work-dir {WORK_DIR}.

Acknowlegements

We sincerely thank the contributors for their open-source code: MMDetection3D and Pointcept.

Reference

@article{ERABATI2024104231,
title = {RetSeg3D: Retention-based 3D semantic segmentation for autonomous driving},
journal = {Computer Vision and Image Understanding},
pages = {104231},
year = {2024},
issn = {1077-3142},
doi = {https://doi.org/10.1016/j.cviu.2024.104231},
url = {https://www.sciencedirect.com/science/article/pii/S1077314224003126},
author = {Gopi Krishna Erabati and Helder Araujo},
}

About

Official code of RetSeg3D: Retention-based 3D Semantic Segmentation for Autonomous Driving paper.

Topics

Resources

License

Stars

Watchers

Forks