Dataloader distributed or not #67

zhd2rng · 2023-04-03T22:16:12Z

Hi, I was checking the logging file, i.e., hrnet_w48_contrast_lr1x_hrnet_contrast_t0.1.log. The epoch and iteration seem to be computed as the training is for a single gpu, while it is a 4 GPU job.

Basically, for 4 GPU job and bs=8 per GPU, one epoch of Cityscapes with 2975 training set, should have 93 iterations if the dataloader is distributed across all GPUs. But in the logging, one epoch has 4 times more iterations. This leads to a question if the dataloader is not distributed over multiple gpus, and how the iteration/epoch is counted. This would affect how the learning rate and warm up iterations are configured.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataloader distributed or not #67

Dataloader distributed or not #67

zhd2rng commented Apr 3, 2023

Dataloader distributed or not #67

Dataloader distributed or not #67

Comments

zhd2rng commented Apr 3, 2023