Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch_size=1 still RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 3.94 GiB total capacity; 3.31 GiB already allocated; 21.50 MiB free; 3.38 GiB reserved in total by PyTorch) #1

Open
andrewis88 opened this issue Jun 30, 2023 · 1 comment

Comments

@andrewis88
Copy link

(molo) andrew@andrew:~/MoLo-master$ python runs/run.py --cfg configs/projects/MoLo/ucf101/MoLo_UCF101_1shot_v1.yaml
/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/aliyunsdkcore/auth/algorithm/sha_hmac256.py:20: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
from cryptography.hazmat.backends import default_backend
/home/andrew/MoLo-master/models/base/few_shot.py:56: UserWarning: PyTorch version 1.7.1 or higher is recommended
warnings.warn("PyTorch version 1.7.1 or higher is recommended")
Loading config from configs/projects/MoLo/ucf101/MoLo_UCF101_1shot_v1.yaml.
[06/30 15:32:31][INFO] train_net_few_shot: 473: Train with config:
[06/30 15:32:31][INFO] train_net_few_shot: 474: {
........
[06/30 15:32:52][INFO] utils.misc: 156: Params: 89,616,923
[06/30 15:32:52][INFO] utils.misc: 157: Mem: 0.33546018600463867 MB
[06/30 15:32:52][INFO] utils.misc: 164: nvidia-smi
Fri Jun 30 15:32:52 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 20% 42C P0 N/A / 75W | 856MiB / 4096MiB | 14% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1081 G /usr/lib/xorg/Xorg 75MiB |
| 0 N/A N/A 1388 G /usr/bin/gnome-shell 55MiB |
| 0 N/A N/A 2795 G ...RendererForSitePerProcess 1MiB |
| 0 N/A N/A 96398 C python 719MiB |
+-----------------------------------------------------------------------------+
[06/30 15:32:52][INFO] models.utils.optimizer: 83: Optimized parameters constructed. Parameters without weight decay: []
[06/30 15:32:52][INFO] datasets.base.ssv2_few_shot: 110: Reading video list from file: train_few_shot.txt
[06/30 15:32:52][INFO] datasets.base.ssv2_few_shot: 145: Loading HMDB_few_shot dataset list for split 'train'...
[06/30 15:32:52][INFO] datasets.base.ssv2_few_shot: 57: loaded 9154 videos from train dataset: HMDB_few_shot !
[06/30 15:32:52][INFO] datasets.base.ssv2_few_shot: 171: Dataset HMDB_few_shot split train loaded. Length 9154.
[06/30 15:32:52][INFO] datasets.base.ssv2_few_shot: 110: Reading video list from file: test_few_shot.txt
[06/30 15:32:52][INFO] datasets.base.ssv2_few_shot: 145: Loading HMDB_few_shot dataset list for split 'test'...
[06/30 15:32:52][INFO] datasets.base.ssv2_few_shot: 57: loaded 2745 videos from test dataset: HMDB_few_shot !
[06/30 15:32:52][INFO] datasets.base.ssv2_few_shot: 171: Dataset HMDB_few_shot split test loaded. Length 2745.
[06/30 15:32:52][INFO] train_net_few_shot: 511: Mixup/cutmix disabled.
[06/30 15:32:52][INFO] train_net_few_shot: 523: Start epoch: 1
[06/30 15:32:52][INFO] train_net_few_shot: 55: Norm training: True
/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/functional.py:2973: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/functional.py:2973: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/functional.py:2973: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/functional.py:2973: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
File "runs/run.py", line 103, in
main()
File "runs/run.py", line 97, in main
launch_task(cfg=run[0], init_method=run[0].get_args().init_method, func=run[1])
File "/home/andrew/MoLo-master/utils/launcher.py", line 36, in launch_task
func(cfg=cfg)
File "/home/andrew/MoLo-master/runs/train_net_few_shot.py", line 531, in train_few_shot
train_loader, model, model_ema, optimizer, train_meter, cur_epoch, mixup_fn, cfg, writer, val_meter, val_loader
File "/home/andrew/MoLo-master/runs/train_net_few_shot.py", line 104, in train_epoch
model_dict = model(task_dict)
File "/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/andrew/MoLo-master/models/base/models.py", line 44, in forward
x = self.head(x)
File "/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/andrew/MoLo-master/models/base/few_shot.py", line 2552, in forward
support_features, target_features, class_logits, support_features_motion, target_features_motion, feature_motion_recons = self.get_feats(support_images, target_images, support_labels)
File "/home/andrew/MoLo-master/models/base/few_shot.py", line 2464, in get_feats
support_features = self.pre_reduce(self.backbone(support_images)).squeeze() # [40, 2048, 7, 7] (5 way - 1 shot - 5 query)
File "/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torchvision/models/resnet.py", line 109, in forward
out = self.bn2(out)
File "/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 106, in forward
exponential_average_factor, self.eps)
File "/home/andrew/anaconda3/envs/molo/lib/python3.6/site-packages/torch/nn/functional.py", line 1923, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 3.94 GiB total capacity; 3.31 GiB already allocated; 21.50 MiB free; 3.38 GiB reserved in total by PyTorch)

@xiiliao8465
Copy link

did u solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants