DEST: Depth Estimation with Simplified Transformer
John Yang, Le An, Anurag Dixit, Jinkyu Koo, Su Inn Park
CVPR Workshop on Transformers For Vision, 2022
DEST leverages a simplified design of attention block in the transformer that is GPU friendly. Compared to state-of-the-art methods, our model achieves over 80% reduction in terms of model size and computation, while being more accurate and faster. The proposed model was validated on both depth esitimation and semantic segmentation tasks. This repository contains the official Pytorch model implementation and training configuration which can be adapted to your traing workflow.
For depth estimation, we employ the same setup as that in PackNet-sfm. For details on environment preparation, data download, and training/evaluation scripts, please refer to the original repo for details.
Run the following commands
git clone
cd packnet-sfm
cp path/to/DEST/configs/train_kitti_dest.yaml configs/
cp path/to/DEST/models/* packnet_sfm/models/
cp path/to/DEST/networks/ packnet_sfm/networks/depth/
mkdir packnet_sfm/networks/DEST
cp path/to/DEST/networks/DEST/*.py packnet_sfm/networks/DEST/
in order to place DEST and its config file within the PackNet-sfm implementation as shown below:
├ configs
│ ...
│ └ train_kitti_dest.yaml
├ packnet_sfm
│ ...
│ ├ models
│ │ ...
│ │ ├
│ │ ├
│ │ └
│ ├ networks
│ │ ...
│ │ ├ depth
│ │ │ ...
│ │ │ └
│ │ └ DEST
│ │ ├
│ │ ├
│ │ ├
│ │ └
Our work quires timm
library, so please add the following line in docker/Dockerfile
RUN pip install timm
Before building the docker image, we also need to adjust the Python version, CUDNN version, NCCL version, etc. in the Dockerfile according to our machine. Note that the minimum supported Python version is 3.7. Base images can be found from dockerhub:
After properly configuring Dockerfile, please follow the instructions to build your docker image.
Also, due to the issues from the PackNet repository during evalution,
you need to edit the lines of L295, L302 from the file packnet-sfm/packnet_sfm/models/
Change lines
[L295] depth = inv2depth(inv_depths[0])
[L301] inv_depth_pp = post_process_inv_depth(
[L302] inv_depths[0], inv_depths_flipped[0], method='mean')
[L295] depth = inv2depth(inv_depths)
[L301] inv_depth_pp = post_process_inv_depth(
[L302] inv_depths, inv_depths_flipped, method='mean')
To train DEST from scratch on KITTI dataset, run the following command:
python scripts/ configs/train_kitti_dest.yaml
For the evaluation of DEST model on KITTI dataset, run the following:
python scripts/ --checkpoint <DEST.ckpt> [--config <config.yaml>]
For inference on a single image or folder: You can also directly run inference on a single image or folder:
python scripts/ --checkpoint <DEST.ckpt> --input <image or folder> --output <image or folder> [--image_shape <input shape (h,w)>]
For semantic segmentation, our implementation can be readily integrated into OpenMMLab Semantic Segmentation Toolbox and Benchmark implementation for training and evaluation.
Please refer to their instruction for installations and dataset preparatation. Our DEST is trained/evaluated on CityScapes Dataset.
In order to follow MMSegmentation instructions for training, refer to the files that are located at DEST/semseg/
re-locate the files within the MMSegmentation repository by running the following commands:
git clone # first clone the MMSegmentation env
cd mmsegmentation
mkdir configs/dest/
cp path/to/DEST/semseg/ configs/_base_/models/
cp path/to/DEST/semseg/ configs/_base_/schedules/
cp path/to/DEST/semseg/ configs/_base_/datasets/
cp path/to/DEST/semseg/dest_simpatt-* configs/dest/
cp path/to/DEST/semseg/ mmseg/models/backbones/
cp path/to/DEST/semseg/ mmseg/models/decode_heads/
You now need to include DEST in their library
echo 'from .simplified_attention_mmseg import SimplifiedTransformer' >> mmseg/models/backbones/
echo 'from .dest_head import DestHead' >> mmseg/models/decode_heads/
Then, you can start training/evaluating with a desired configuration of DEST.
Example: train DEST-B1 on CityScapes Dataset:
# Single-gpu training
python tools/ configs/dest/
# Multi-gpu training
./tools/ configs/dest/ <GPU_NUM>
After training, you can evaluate the trained model (e.g. DEST-B1)
# Single-gpu testing
python tools/ configs/dest/ /path/to/checkpoint_file
# Multi-gpu testing
./tools/ configs/dest/ /path/to/checkpoint_file <GPU_NUM>
# Multi-gpu, multi-scale testing
tools/ configs/dest/ /path/to/checkpoint_file <GPU_NUM> --aug-test
The provided code can be used for research or other non-commercial purposes. For details please check the LICENSE file.
title={Depth Estimation with Simplified Transformer},
author={Yang, John and An, Le and Dixit, Anurag and Koo, Jinkyu and Park, Su Inn},
journal={arXiv preprint arXiv:2204.13791},