This is the official PyTorch implementation of the paper RetSeg3D: Retention-based 3D Semantic Segmentation for Autonomous Driving, by Gopi Krishna Erabati and Helder Araujo.
G. K. Erabati and H. Araujo, "RetSeg3D: Retention-based 3D Semantic Segmentation for Autonomous Driving," in Computer Vision and Image Understanding (CVIU), 2024. https://doi.org/10.1016/j.cviu.2024.104231
Contents
LiDAR semantic segmentation is one of the crucial tasks for scene understanding in autonomous driving. Recent trends suggest that voxel- or fusion-based methods obtain improved performance. However, the fusion-based methods are computationally expensive. On the other hand, the voxel-based methods uniformly employ local operators (e.g., 3D SparseConv) without considering the varying-density property of LiDAR point clouds, which result in inferior performance, specifically on far away sparse points due to limited receptive field. To tackle this issue, we propose novel retention block to capture long range dependencies and maintain the receptive field of far away sparse points and design RetSeg3D, a retention-based 3D semantic segmentation model for autonomous driving. Instead of vanilla attention mechanism to model long range dependencies, inspired by RetNet, we design cubic window multi-scale retentive self-attention (CW-MSRetSA) module with bidirectional and 3D explicit decay mechanism to introduce 3D spatial distance related prior information into the model to improve not only the receptive field but also the model capacity. Our novel retention block maintains the receptive field which significantly improve the performance of far away sparse points. We conduct extensive experiments and analysis on three large-scale datasets: SemanticKITTI, nuScenes and Waymo. Our method not only outperform existing methods on far away sparse points but also on close and medium distance points and efficiently runs in real time at 52.1 FPS.
RetSeg3D | SemanticKITTI | nuScenes | Waymo |
---|---|---|---|
mIoU | 70.3 | 76.9 | 70.1 |
Config | retseg3d_semantickitti.py | retseg3d_nus.py | retseg3d_waymo.py |
Model | weights | weights |
We can not distribute the model weights on Waymo dataset due to the Waymo license terms.
The code is tested on the following configuration:
- Ubuntu 20.04.6 LTS
- CUDA==11.7
- Python==3.8.10
- PyTorch==2.0.1
- mmcv==2.1.0
- mmengine==0.10.1
- mmdet==3.2.0
- mmdet3d==1.3.0
mkvirtualenv retseg3d
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -U openmim
mim install mmcv==2.1.0
mim install mmengine==0.10.1
pip install -r requirements.txt
Follow MMDetection3D-1.3.0 to prepare the SemanticKITTI and nuScenes datasets. Follow Pointcept for Waymo data prepreocessing and then run python tools/create_waymo_semantic_info.py /path/to/waymo/preprocess/dir
to generate the .pkl
files required for the config.
Warning: Please strictly follow MMDetection3D-1.3.0 code to prepare the data because other versions of MMDetection3D have different coordinate refactoring.
git clone https://github.com/gopi-erabati/RetSeg3D.git
cd RetSeg3D
- Single GPU training
- Add the present working directory to PYTHONPATH
export PYTHONPATH=$(pwd):$PYTHONPATH
python tools/train.py configs/retseg3d_semantickitti.py --work-dir {WORK_DIR}
.
- Add the present working directory to PYTHONPATH
- Multi GPU training
tools/dist_train.sh configs/retseg3d_semantickitti.py {GPU_NUM} --work-dir {WORK_DIR}
.
- Single GPU training
- Add the present working directory to PYTHONPATH
export PYTHONPATH=$(pwd):$PYTHONPATH
python tools/train.py configs/retseg3d_nus.py --work-dir {WORK_DIR}
.
- Add the present working directory to PYTHONPATH
- Multi GPU training
tools/dist_train.sh configs/retseg3d_nus.py {GPU_NUM} --work-dir {WORK_DIR}
.
- Single GPU training
- Add the present working directory to PYTHONPATH
export PYTHONPATH=$(pwd):$PYTHONPATH
python tools/train.py configs/retseg3d_waymo.py --work-dir {WORK_DIR}
- Add the present working directory to PYTHONPATH
- Multi GPU training
tools/dist_train.sh configs/retseg3d_waymo.py {GPU_NUM} --work-dir {WORK_DIR}
- Single GPU testing
- Add the present working directory to PYTHONPATH
export PYTHONPATH=$(pwd):$PYTHONPATH
python tools/test.py configs/retseg3d_semantickitti.py /path/to/ckpt --work-dir {WORK_DIR}
- Add the present working directory to PYTHONPATH
- Multi GPU testing
./tools/dist_test.sh configs/retseg3d_semantickitti.py /path/to/ckpt {GPU_NUM} --work-dir {WORK_DIR}
.
- Single GPU testing
- Add the present working directory to PYTHONPATH
export PYTHONPATH=$(pwd):$PYTHONPATH
python tools/test.py configs/retseg3d_nus.py /path/to/ckpt --work-dir {WORK_DIR}
.
- Add the present working directory to PYTHONPATH
- Multi GPU testing
./tools/dist_test.sh configs/retseg3d_nus.py /path/to/ckpt {GPU_NUM} --work-dir {WORK_DIR}
.
- Single GPU testing
- Add the present working directory to PYTHONPATH
export PYTHONPATH=$(pwd):$PYTHONPATH
python tools/test.py configs/retseg3d_waymo.py /path/to/ckpt --work-dir {WORK_DIR}
.
- Add the present working directory to PYTHONPATH
- Multi GPU testing
./tools/dist_test.sh configs/retseg3d_waymo.py /path/to/ckpt {GPU_NUM} --work-dir {WORK_DIR}
.
We sincerely thank the contributors for their open-source code: MMDetection3D and Pointcept.
@article{ERABATI2024104231,
title = {RetSeg3D: Retention-based 3D semantic segmentation for autonomous driving},
journal = {Computer Vision and Image Understanding},
pages = {104231},
year = {2024},
issn = {1077-3142},
doi = {https://doi.org/10.1016/j.cviu.2024.104231},
url = {https://www.sciencedirect.com/science/article/pii/S1077314224003126},
author = {Gopi Krishna Erabati and Helder Araujo},
}