-
Notifications
You must be signed in to change notification settings - Fork 30
Config Wiki
Welcome to the Config wiki! Please carefully read the document to understand how to run a training/test using our code
Our config file is divided into 3 parts.
- General: That goes over the general parameters of the model and the training procedure
- Dataset: This section will contain all the necessary information for a correct dataset loading
- wandb: Wandb paramters for visualizing the results (recommended)
Variable Name | Explanation | Possible values |
---|---|---|
device |
Torch device to use during training |
cpu or cuda
|
type |
Refers to the head of the model |
full to extract both depth map and segmentation mask, depth to extract only the depth map, segmentation to extract only the segmentation mask. PS: This parameter change the architecture of the model, so if you want to use pre_trained weights obtained on a depth-only model, you need to set this parameter according to this
|
model_timm |
Pre-trained Vision Transformer for the encoder |
vit_base_patch16_384 for ViT Base, vit_large_patch16_384 for ViT Large |
emb_dim |
Dimension of the embeddings generated by the decoder. Refer to our report or the original paper to understand more | Recommended: 768 for ViT Base, 1024 for ViT Large |
hooks |
Refers to the layers that will be hooked (c.f architecture) | Recommended: [2, 5, 8, 11] for ViT Base, [5,11,17,23] for ViT Large |
read |
Readout module type, refer to our report to understand more |
projection , sum or ignore
|
resample_dim |
Refers to the decoder embeddings dimension | Recommended: 256
|
optim |
Optmizer to use (for both trainable paramters) |
sgd or adam
|
lr_backbone |
Learning rate to use for the backbone | Any float > 0 and < 1 ; Recommended: 1e-5 (with Adam) Since we use pre-trained weights |
lr_scratch |
Learning rate to use for the decoder and the bi-head module | Any float > 0 and < 1 ; Recommended: 3e-4 (with Adam) |
loss_depth |
Loss function to use for training the depth module |
ssi for scale and shift invariant loss or mse for classic MSE Loss |
loss_segmentation |
Loss function to use for the segmentation module |
ce for CrossEntropy loss |
momentum |
Momentum to use for the SGD optimizer. iff optim=sgd | Any float > 0 and < 1 |
epochs |
Number of epochs for training | Any integer > 0 |
batch_size |
Batch_size for training | Any integer > 0 |
path_model |
||
path_predicted_images |
||
seed |
Random seed for reproducibility | Any integer > 0 |
USELESSpatch_size
|
Patch size to use for the ViT backbone | Any supported patch size according to the backbone model you choose |
Variable Name | Explanation | Possible values |
---|---|---|
path_dataset |
Folder where your datasets are located | path to a folder |
list_datasets |
List of folders (in path_dataset) for each dataset to use in the training | Example: ["inria", "nyuv2] means that you have these 2 paths: path_dataset/inria and path_dataset/nyuv2, and in each of them, you have 3 folders: "depths", "images", and "segmentations", and each of these paths contains all images with extension as defined below. !The corresponding images/depths/segmentations should have the same name in each directory! |
Precises the extension of the input images with their ground truths.
Variable Name | Explanation | Possible values |
---|---|---|
ext_images |
Extension of the input images |
.jpg or .png
|
ext_segmentations |
Extension of the associated segmentation masks |
.jpg or .png
|
ext_depths |
Extension of the depth maps |
.jpg or .png
|
It corresponds to the split proportion for each split (train, val and test). The sum of the 3 values needs to be equal to 1.
Variable Name | Explanation | Possible values |
---|---|---|
split_train |
Split of the training set | Any float > 0 and < 1 |
split_val |
Split of the validation set | Any float > 0 and < 1 |
split_test |
Split of the testing set | Any float > 0 and < 1 |
Describes the properties of the transformations that needs to be applied to the input and the ground truths.
Variable Name | Explanation | Possible values |
---|---|---|
resize |
The value of resizing the input image | Any integer value |
p_flip |
Probability of vertically flipping the input image | Any float > 0 and < 1 |
p_crop |
Probability of randomly center cropping the input image | Any float > 0 and < 1 |
p_rot |
Probability of randomly slightly rotating the input image | Any float > 0 and < 1 |
In this section, you need to specify a dictionary containing the mapping between the colors and the corresponding classes. The background class is automatically created with "0"
as the key and {"name": "background","color": [0,0,0]}
. You have to manually add new classes by explicitly noting the class name and the RGB color of your new class.
In order to visualize correctly the training procedure, we recommend you to create an account on wandb before running the script.
Variable Name | Explanation | Possible values |
---|---|---|
enable |
Enable the wandb monitoring, we recommend you to enable this parameter for a better training experience |
true or false
|
username |
Your username on wandb | |
images_to_show |
Number of images to show at the end of each epoch, to visualize the results on wandb | any integer < 10 |
im_h |
Height of the images during visualization | |
im_w |
Width of the images during visualization |