Config Wiki

Welcome to the Config wiki! Please carefully read the document to understand how to run a training/test using our code

Our config file is divided into 3 parts.

General: That goes over the general parameters of the model and the training procedure
Dataset: This section will contain all the necessary information for a correct dataset loading
wandb: Wandb paramters for visualizing the results (recommended)

General parameters

Variable Name	Explanation	Possible values
`device`	Torch device to use during training	`cpu` or `cuda`
`type`	Refers to the head of the model	`full` to extract both depth map and segmentation mask, `depth` to extract only the depth map, `segmentation` to extract only the segmentation mask. PS: This parameter change the architecture of the model, so if you want to use pre_trained weights obtained on a depth-only model, you need to set this parameter according to this
`model_timm`	Pre-trained Vision Transformer for the encoder	`vit_base_patch16_384` for ViT Base, `vit_large_patch16_384` for ViT Large
`emb_dim`	Dimension of the embeddings generated by the decoder. Refer to our report or the original paper to understand more	Recommended: `768` for ViT Base, `1024` for ViT Large
`hooks`	Refers to the layers that will be hooked (c.f architecture)	Recommended: `[2, 5, 8, 11]` for ViT Base, `[5,11,17,23]` for ViT Large
`read`	Readout module type, refer to our report to understand more	`projection`, `sum` or `ignore`
`resample_dim`	Refers to the decoder embeddings dimension	Recommended: `256`
`optim`	Optmizer to use (for both trainable paramters)	`sgd` or `adam`
`lr_backbone`	Learning rate to use for the backbone	Any float > 0 and < 1 ; Recommended: `1e-5` (with Adam) Since we use pre-trained weights
`lr_scratch`	Learning rate to use for the decoder and the bi-head module	Any float > 0 and < 1 ; Recommended: `3e-4` (with Adam)
`loss_depth`	Loss function to use for training the depth module	`ssi` for scale and shift invariant loss or `mse` for classic MSE Loss
`loss_segmentation`	Loss function to use for the segmentation module	`ce` for CrossEntropy loss
`momentum`	Momentum to use for the SGD optimizer. iff optim=sgd	Any float > 0 and < 1
`epochs`	Number of epochs for training	Any integer > 0
`batch_size`	Batch_size for training	Any integer > 0
`path_model`
`path_predicted_images`
`seed`	Random seed for reproducibility	Any integer > 0
USELESS`patch_size`	Patch size to use for the ViT backbone	Any supported patch size according to the backbone model you choose

Dataset parameters

Paths

Variable Name	Explanation	Possible values
`path_dataset`	Folder where your datasets are located	path to a folder
`list_datasets`	List of folders (in path_dataset) for each dataset to use in the training	Example: ["inria", "nyuv2] means that you have these 2 paths: path_dataset/inria and path_dataset/nyuv2, and in each of them, you have 3 folders: "depths", "images", and "segmentations", and each of these paths contains all images with extension as defined below. !The corresponding images/depths/segmentations should have the same name in each directory!

Extensions

Precises the extension of the input images with their ground truths.

Variable Name	Explanation	Possible values
`ext_images`	Extension of the input images	`.jpg` or `.png`
`ext_segmentations`	Extension of the associated segmentation masks	`.jpg` or `.png`
`ext_depths`	Extension of the depth maps	`.jpg` or `.png`

Splits

It corresponds to the split proportion for each split (train, val and test). The sum of the 3 values needs to be equal to 1.

Variable Name	Explanation	Possible values
`split_train`	Split of the training set	Any float > 0 and < 1
`split_val`	Split of the validation set	Any float > 0 and < 1
`split_test`	Split of the testing set	Any float > 0 and < 1

Transforms

Describes the properties of the transformations that needs to be applied to the input and the ground truths.

Variable Name	Explanation	Possible values
`resize`	The value of resizing the input image	Any integer value
`p_flip`	Probability of vertically flipping the input image	Any float > 0 and < 1
`p_crop`	Probability of randomly center cropping the input image	Any float > 0 and < 1
`p_rot`	Probability of randomly slightly rotating the input image	Any float > 0 and < 1

Classes

In this section, you need to specify a dictionary containing the mapping between the colors and the corresponding classes. The background class is automatically created with "0" as the key and {"name": "background","color": [0,0,0]}. You have to manually add new classes by explicitly noting the class name and the RGB color of your new class.

Wandb parameters

In order to visualize correctly the training procedure, we recommend you to create an account on wandb before running the script.

Variable Name	Explanation	Possible values
`enable`	Enable the wandb monitoring, we recommend you to enable this parameter for a better training experience	`true` or `false`
`username`	Your username on wandb
`images_to_show`	Number of images to show at the end of each epoch, to visualize the results on wandb	any integer < 10
`im_h`	Height of the images during visualization
`im_w`	Width of the images during visualization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly