训练时有足够内存，依然得到out of memory问题 #21

montensorrt · 2021-09-07T06:55:15Z

您好，首先感谢您分享了您的稠密目标定位算法，最近我对您得算法进行实现得时候出现了问题，如下：
首先我是在windows10平台上运行得训练过程，前面数据和模型处理好了以后，开始训练出现问题如下：
不论运行那个模型，当模型开始正向执行，总是在第一个conv出爆出问题：
VGG16_FPN.py:
def forward(self, x):
f = []
x = self.layer1(x)
seg_hrnet.py:
def forward(self, x):
residual = x
out = self.conv1(x)
总是爆出内存不足问题，
RuntimeError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 12.00 GiB total capacity; 886.66 MiB already allocated; 5.14 GiB free; 4.94 GiB reserved in total by PyTorch)
但是其中它想去分配4.5G，（Tried to allocate 4.50 GiB），而我的电脑除了pytorch占用得，还有5G(5.14 GiB free; 4.94 GiB reserved in total by PyTorch).
目前不知道是哪里除了问题，请问您原始代码是在那个平台训练得，ubuntu吗？或者您这边有什么思路可以解决此问题吗？
期待您得回复，万分感谢！

taohan10200 · 2021-09-07T13:00:39Z

Tips: The training process takes ~50 hours on NWPU datasets with two TITAN RTX (48GB Memeory).

您好，我们是在ubuntu上训练的，按照这个代码预设的batchsize=8的话，需要的显存比较大，如果你只有一张12G的显卡的话，建议可以把batchsize减小或者图片crop的size减小

montensorrt changed the title ~~训练时有足够得内存，依然得到out of memory问题~~ 训练时有足够内存，依然得到out of memory问题 Sep 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

训练时有足够内存，依然得到out of memory问题 #21

训练时有足够内存，依然得到out of memory问题 #21

montensorrt commented Sep 7, 2021

taohan10200 commented Sep 7, 2021

训练时有足够内存，依然得到out of memory问题 #21

训练时有足够内存，依然得到out of memory问题 #21

Comments

montensorrt commented Sep 7, 2021

taohan10200 commented Sep 7, 2021