You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I was checking the logging file, i.e., hrnet_w48_contrast_lr1x_hrnet_contrast_t0.1.log. The epoch and iteration seem to be computed as the training is for a single gpu, while it is a 4 GPU job.
Basically, for 4 GPU job and bs=8 per GPU, one epoch of Cityscapes with 2975 training set, should have 93 iterations if the dataloader is distributed across all GPUs. But in the logging, one epoch has 4 times more iterations. This leads to a question if the dataloader is not distributed over multiple gpus, and how the iteration/epoch is counted. This would affect how the learning rate and warm up iterations are configured.
Thanks.
The text was updated successfully, but these errors were encountered:
Hi, I was checking the logging file, i.e., hrnet_w48_contrast_lr1x_hrnet_contrast_t0.1.log. The epoch and iteration seem to be computed as the training is for a single gpu, while it is a 4 GPU job.
Basically, for 4 GPU job and bs=8 per GPU, one epoch of Cityscapes with 2975 training set, should have 93 iterations if the dataloader is distributed across all GPUs. But in the logging, one epoch has 4 times more iterations. This leads to a question if the dataloader is not distributed over multiple gpus, and how the iteration/epoch is counted. This would affect how the learning rate and warm up iterations are configured.
Thanks.
The text was updated successfully, but these errors were encountered: