You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, i have this problem where i run out of memory when running python train.py --threed_match_dir ~/dataset/threedmatch/ --batch_size 1.
At first i ran out of memory before even starting the first epochs, so i changed the batch_size to 1 (batch_size 2 was still too much). After going through some thousands epochs i started getting "out of memory" errors like:
INFO - 2021-02-22 12:51:28,348 - trainer - Train Epoch: 1 [1440/7317], Current Loss: 1.157e+00 Pos: 0.365 Neg: 0.792 Data time: 0.0536, Train time: 0.5614, Iter time: 0.6150
Traceback (most recent call last):
File "train.py", line 84, in <module>
main(config)
File "train.py", line 63, in main
trainer.train()
File "/home/f/repos/FCGF/lib/trainer.py", line 132, in train
self._train_epoch(epoch)
File "/home/f/repos/FCGF/lib/trainer.py", line 492, in _train_epoch
self.config.batch_size)
File "/home/f/repos/FCGF/lib/trainer.py", line 427, in contrastive_hardest_negative_loss
D01 = pdist(posF0, subF1, dist_type='L2')
File "/home/f/repos/FCGF/lib/metrics.py", line 24, in pdist
D2 = torch.sum((A.unsqueeze(1) - B.unsqueeze(0)).pow(2), 2)
RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 3.82 GiB total capacity; 744.27 MiB already allocated; 43.38 MiB free; 814.00 MiB reserved in total by PyTorch)
Currently i'm my system takes up 500MiB VRAM from my GTX 1650 (4GB) and the rest is used by pytorch. I'm running pytorch 1.7 in a python 3.7 conda enviroment and i tried compiling minkowskiEngine with cuda 11.2 and currently i'm running cuda 10.2 but both gave the same error.
On a side note: Isn't it bad to run a batch size of only 1, wouldn't that cause poor convergence?
The text was updated successfully, but these errors were encountered:
Hello, i have this problem where i run out of memory when running
python train.py --threed_match_dir ~/dataset/threedmatch/ --batch_size 1
.At first i ran out of memory before even starting the first epochs, so i changed the batch_size to 1 (batch_size 2 was still too much). After going through some thousands epochs i started getting "out of memory" errors like:
Currently i'm my system takes up 500MiB VRAM from my GTX 1650 (4GB) and the rest is used by pytorch. I'm running pytorch 1.7 in a python 3.7 conda enviroment and i tried compiling minkowskiEngine with cuda 11.2 and currently i'm running cuda 10.2 but both gave the same error.
On a side note: Isn't it bad to run a batch size of only 1, wouldn't that cause poor convergence?
The text was updated successfully, but these errors were encountered: