TRIDENT: Efficient Triple-Task Learning of Dehazing, Depth, and Uncertainty Estimation for Underwater 3-D Robot Visual Perception
IEEE Sensors Journal 2024
This repository represents the official implementation of the paper titled "TRIDENT: Efficient Triple-Task Learning of Dehazing, Depth, and Uncertainty Estimation for Underwater 3-D Robot Visual Perception".
In this paper, we introduce a novel learning-based sensing system that tackles the multidimensional vision tasks in underwater; concretely, we deal with image enhancement, depth estimation, and uncertainty for 3-D visual systems. Also, we propose a TRIDENT model in a fast and lightweight manner; TRIDENT consists of three parallelized decoders and one backbone structure for efficient feature sharing. In addition, it is designed to be trained to express complex parameterization. In experimental evaluation on several standard datasets, we demonstrate that TRIDENT significantly outperforms other existing methods on image enhancement and depth estimation. Despite performing three tasks, our model has better efficiency than the others for both memory size and inference time. Finally, our joint learning approach demonstrates robustness in feature matching and seamlessly extends from 2-D to 3-D vision tasks.
-
Run the demo locally (requires a GPU and an
nvidia-docker2
, see Installation Guide) -
Optionally, we provide instructions to use docker in multiple ways. (But, recommended using
docker compose
, see Installation Guide). -
The code requires
python>=3.8
, as well aspytorch>=1.7
andtorchvision>=0.8
. But, we don't provide the instructions to install both PyTorch and TorchVision dependencies. Please usenvidia-docker2
π. Installing both PyTorch and TorchVision with CUDA support is strongly recommended. -
This code was tested on:
- Ubuntu 22.04 LTS, Python 3.10.12, CUDA 11.7, GeForce RTX 3090 (pip)
- Ubuntu 22.04 LTS, Python 3.8.6, CUDA 12.0, RTX A6000 (pip)
- Ubuntu 20.04 LTS, Python 3.10.12, CUDA 12.1, GeForce RTX 3080ti (pip)
- π¦ Prepare Repository & Checkpoints
- β¬ Prepare Dataset
- π Prepare Docker Image and Run the Docker Container
- π Training for TRIDENT on Joint-ID Dataset
- π Testing for TRIDENT on Joint-ID Dataset
- π Testing for TRIDENT on Standard or Custom Dataset
-
Clone the repository (requires git):
git clone https://github.com/sparolab/TRIDENT.git cd TRIDENT
-
Let's call the path where TRIDENT's repository is located
${TRIDENT_root}
. -
You don't need to download checkpoint files. Checkpoint files are already in
${TRIDENT_root}/TRIDENT/ckpt
. Because the model is so light-weight, it's already in the git repo.
-
TRIDENT uses the dataset from the Joint-ID paper. Please download the Joint_ID_Dataset.zip
-
Next, unzip the file named
Joint_ID_Dataset.zip
with the downloaded path as${dataset_root_path}
.sudo unzip ${dataset_root_path}/Joint_ID_Dataset.zip # ${dataset_root_path} requires at least 2.3 Gb of space. # ${dataset_root_path} is the absolute path, not relative path.
-
After downloading, you should see the following file structure in the
Joint_ID_Dataset
folderπ¦ Joint_ID_Dataset β£ π train β β£ π LR # GT for traning dataset β β β£ π 01_Warehouse β β β β£ π color # enhanced Image β β β β β£ π in_00_160126_155728_c.png β β β β ... β β β β β β β β π depth_filled # depth Image β β β β£ π in_00_160126_155728_depth_filled.png β β β ... β β ... β β π synthetic # synthetic distorted dataset β β£ π LR@[email protected] β β£ ... β β π test # 'test'folder has same structure as 'train'folder ...
-
After downloading, you should see the following file structure in the
Joint_ID_Dataset
folder -
If you want to know the dataset, then see the project page for additional dataset details.
To run a docker container, we need to create a docker image. There are two ways to create a docker image and run the docker container.
-
Use
docker pull
or:# download the docker image docker pull ygm7422/official_trident:latest # run the docker container nvidia-docker run \ --privileged \ --rm \ --gpus all -it \ --name trident \ --ipc=host \ --shm-size=256M \ --net host \ -v /tmp/.X11-unix:/tmp/.X11-unix \ -e DISPLAY=unix$DISPLAY \ -v /root/.Xauthority:/root/.Xauthority \ --env="QT_X11_NO_MITSHM=1" \ -v ${dataset_root_path}/Joint_ID_Dataset:/root/workspace/dataset_root \ -v ${TRIDENT_root}/TRIDENT:/root/workspace \ ygm7422/official_trident:latest
-
Use
docker compose
(this is used to build docker iamges and run container simultaneously):cd ${TRIDENT_root}/TRIDENT # build docker image and run container simultaneously bash run_docker.sh up gpu ${dataset_root_path}/Joint_ID_Dataset # Inside the container docker exec -it TRIDENT bash
Regardless of whether you use method 1 or 2, you should have a docker container named TRIDENT
running.
- First, move to the
/root/workspace
folder inside the docker container. Then, run the following command to start the training.# move to workspace cd /root/workspace # start two task (image enhancement & depth estimation) training on Joint-ID Dataset python run.py local_configs/arg_joint_train_trident.txt
- The model's checkpoints and log files are saved in the
/root/workspace/save
folder. - If you want to change the default variable setting for training, see Inference settings below.
- If you want to use uncertainty module, then run the following command to start the training. (You should have already finished the first training.)
# move to workspace cd /root/workspace # start uncertainty module training on Joint-ID Dataset python run.py local_configs/arg_triple_train_trident.txt
-
First, move to the
/root/workspace
folder inside the docker container. Then, run the following command to start the testing.# move to workspace cd /root/workspace # start to test on Joint-ID Dataset python run.py local_configs/arg_joint_test_trident.txt or # start to test on Joint-ID Dataset python run.py local_configs/arg_triple_test_trident.txt
-
The test images and results are saved in the
result_joint.TRIDENT_two_task
orresult_joint.TRIDENT_three_task
folder. -
If you want to change the default variable setting for testing, see Inference settings below.
-
Set the dataset related variables in the
local_configs/cfg/TRIDENT_two_task.py
file. Below, enter the input image path in thesample_test_data_path
variable.... # If you want to adjust the image size, adjust the `image_size` below. image_size = dict(input_height=288, input_width=512) ... # Dataset dataset = dict( train_data_path='dataset_root/train/synthetic', ... # sample_test_data_path='${your standard or custom dataset path}', sample_test_data_path='demo', video_txt_file='' ) ...
-
First, move to the
/root/workspace
folder inside the docker container. Then, run the following command to start the testing.# move to workspace cd /root/workspace # start to test on standard datasets python run.py local_configs/arg_joint_samples_test.txt or # start to test on standard datasets python run.py local_configs/arg_triple_samples_test.txt
-
The test images and results are saved in the
sample_eval_result_joint.TRIDENT_two_task
folder orsample_eval_result_joint.TRIDENT_three_task
folder.
We set the hyperparameters in 'local_configs/cfg/joint.diml.joint_id.py'.
depth_range
: Range of depth we want to estimate
image_size
: the size of the input image data. If you set this variable, make sure to set auto_crop
to False in train_dataloader_cfg
, or eval_dataloader_cfg
, or test_dataloader_cfg
, or sample_test_cfg
below. If you do not want to set image_size
, please set auto_crop
to True. auto_crop
will be input to the model at the original size of the input data.
train_parm
: hyperparameters to set when training.
test_parm
: hyperparameters to set when testing.
Please cite our paper:
@article{yang2024trident,
title={TRIDENT: Efficient Triple-Task Learning of Dehazing, Depth and Uncertainty Estimation for Underwater 3D Robot Visual Perception},
author={Yang, Geonmo and Cho, Younggun},
journal={IEEE Sensors Journal},
year={2024},
publisher={IEEE}
}
Geonmo Yang: [email protected]
Project Link: https://sites.google.com/view/underwater-trident/home
For academic usage, the code is released under the GPL License, Version 3.0 (as defined in the LICENSE). For any commercial purpose, please contact the authors.