Skip to content

Config Wiki

antocad edited this page Feb 17, 2022 · 2 revisions

Welcome to the Config wiki! Please carefully read the document to understand how to run a training/test using our code

Our config file is divided into 3 parts.

  • General: That goes over the general parameters of the model and the training procedure
  • Dataset: This section will contain all the necessary information for a correct dataset loading
  • wandb: Wandb paramters for visualizing the results (recommended)

General parameters

Variable Name Explanation Possible values
device Torch device to use during training cpu or cuda
type Refers to the head of the model full to extract both depth map and segmentation mask, depth to extract only the depth map, segmentation to extract only the segmentation mask. PS: This parameter change the architecture of the model, so if you want to use pre_trained weights obtained on a depth-only model, you need to set this parameter according to this
model_timm Pre-trained Vision Transformer for the encoder vit_base_patch16_384 for ViT Base, vit_large_patch16_384 for ViT Large
emb_dim Dimension of the embeddings generated by the decoder. Refer to our report or the original paper to understand more Recommended: 768 for ViT Base, 1024 for ViT Large
hooks Refers to the layers that will be hooked (c.f architecture) Recommended: [2, 5, 8, 11] for ViT Base, [5,11,17,23] for ViT Large
read Readout module type, refer to our report to understand more projection, sum or ignore
resample_dim Refers to the decoder embeddings dimension Recommended: 256
optim Optmizer to use (for both trainable paramters) sgd or adam
lr_backbone Learning rate to use for the backbone Any float > 0 and < 1 ; Recommended: 1e-5 (with Adam) Since we use pre-trained weights
lr_scratch Learning rate to use for the decoder and the bi-head module Any float > 0 and < 1 ; Recommended: 3e-4 (with Adam)
loss_depth Loss function to use for training the depth module ssi for scale and shift invariant loss or mse for classic MSE Loss
loss_segmentation Loss function to use for the segmentation module ce for CrossEntropy loss
momentum Momentum to use for the SGD optimizer. iff optim=sgd Any float > 0 and < 1
epochs Number of epochs for training Any integer > 0
batch_size Batch_size for training Any integer > 0
seed Random seed for reproducibility Any integer > 0
USELESSpatch_size Patch size to use for the ViT backbone Any supported patch size according to the backbone model you choose

Dataset parameters


Variable Name Explanation Possible values
path_dataset Folder where your datasets are located path to a folder
list_datasets List of folders (in path_dataset) for each dataset to use in the training Example: ["inria", "nyuv2] means that you have these 2 paths: path_dataset/inria and path_dataset/nyuv2, and in each of them, you have 3 folders: "depths", "images", and "segmentations", and each of these paths contains all images with extension as defined below. !The corresponding images/depths/segmentations should have the same name in each directory!


Precises the extension of the input images with their ground truths.

Variable Name Explanation Possible values
ext_images Extension of the input images .jpg or .png
ext_segmentations Extension of the associated segmentation masks .jpg or .png
ext_depths Extension of the depth maps .jpg or .png


It corresponds to the split proportion for each split (train, val and test). The sum of the 3 values needs to be equal to 1.

Variable Name Explanation Possible values
split_train Split of the training set Any float > 0 and < 1
split_val Split of the validation set Any float > 0 and < 1
split_test Split of the testing set Any float > 0 and < 1


Describes the properties of the transformations that needs to be applied to the input and the ground truths.

Variable Name Explanation Possible values
resize The value of resizing the input image Any integer value
p_flip Probability of vertically flipping the input image Any float > 0 and < 1
p_crop Probability of randomly center cropping the input image Any float > 0 and < 1
p_rot Probability of randomly slightly rotating the input image Any float > 0 and < 1


In this section, you need to specify a dictionary containing the mapping between the colors and the corresponding classes. The background class is automatically created with "0" as the key and {"name": "background","color": [0,0,0]}. You have to manually add new classes by explicitly noting the class name and the RGB color of your new class.

Wandb parameters

In order to visualize correctly the training procedure, we recommend you to create an account on wandb before running the script.

Variable Name Explanation Possible values
enable Enable the wandb monitoring, we recommend you to enable this parameter for a better training experience true or false
username Your username on wandb
images_to_show Number of images to show at the end of each epoch, to visualize the results on wandb any integer < 10
im_h Height of the images during visualization
im_w Width of the images during visualization
Clone this wiki locally