Out of memory with batch_size 1 and 4GB VRAM #49

fjodborg · 2021-02-22T12:09:24Z

Hello, i have this problem where i run out of memory when running python train.py --threed_match_dir ~/dataset/threedmatch/ --batch_size 1.
At first i ran out of memory before even starting the first epochs, so i changed the batch_size to 1 (batch_size 2 was still too much). After going through some thousands epochs i started getting "out of memory" errors like:

INFO - 2021-02-22 12:51:28,348 - trainer - Train Epoch: 1 [1440/7317], Current Loss: 1.157e+00 Pos: 0.365 Neg: 0.792	Data time: 0.0536, Train time: 0.5614, Iter time: 0.6150
Traceback (most recent call last):
  File "train.py", line 84, in <module>
    main(config)
  File "train.py", line 63, in main
    trainer.train()
  File "/home/f/repos/FCGF/lib/trainer.py", line 132, in train
    self._train_epoch(epoch)
  File "/home/f/repos/FCGF/lib/trainer.py", line 492, in _train_epoch
    self.config.batch_size)
  File "/home/f/repos/FCGF/lib/trainer.py", line 427, in contrastive_hardest_negative_loss
    D01 = pdist(posF0, subF1, dist_type='L2')
  File "/home/f/repos/FCGF/lib/metrics.py", line 24, in pdist
    D2 = torch.sum((A.unsqueeze(1) - B.unsqueeze(0)).pow(2), 2)
RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 3.82 GiB total capacity; 744.27 MiB already allocated; 43.38 MiB free; 814.00 MiB reserved in total by PyTorch)

Currently i'm my system takes up 500MiB VRAM from my GTX 1650 (4GB) and the rest is used by pytorch. I'm running pytorch 1.7 in a python 3.7 conda enviroment and i tried compiling minkowskiEngine with cuda 11.2 and currently i'm running cuda 10.2 but both gave the same error.

On a side note: Isn't it bad to run a batch size of only 1, wouldn't that cause poor convergence?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory with batch_size 1 and 4GB VRAM #49

Out of memory with batch_size 1 and 4GB VRAM #49

fjodborg commented Feb 22, 2021

Out of memory with batch_size 1 and 4GB VRAM #49

Out of memory with batch_size 1 and 4GB VRAM #49

Comments

fjodborg commented Feb 22, 2021