-
Hello! Thanks for the great work! We are trying to add a mini-reproduction of this model as an example for PyTorch performance study and optimization (pytorch/benchmark#582). The purpose is only for performance and profiling, so we are not interested in the correctness numbers (for now), so what we are doing is to run only 1 batch and 1 epoch on a mini-coco2017 dataset, which contains only 100 images. When training the model on a single Nvidia V100 GPU (32 GB memory), we encountered CUDA OOM failure (with default batch size of 32 for training):
May I ask if this is the expected behaviour? When training with a single GPU, what is the minimal GPU memory requirement? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@xuzhao9 there isn't really enough information there for me to say if this is expected, but these models are very memory hungry. I train on 2 RTX Titan or 3090 (24GB cards) and am often in the 12-24 batch size per card range for lower end models. Always train with AMP. And the memory quickly goes up as you increase the model size so stick to the D0 model for tests. |
Beta Was this translation helpful? Give feedback.
@xuzhao9 there isn't really enough information there for me to say if this is expected, but these models are very memory hungry. I train on 2 RTX Titan or 3090 (24GB cards) and am often in the 12-24 batch size per card range for lower end models.
Always train with AMP. And the memory quickly goes up as you increase the model size so stick to the D0 model for tests.