You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running the code on a single machine with an A100 80GB of GPU memory, and I encountered the following error:
Traceback (most recent call last):
File "main_fft_pretrain.py", line 302, in
main(args)
File "main_fft_pretrain.py", line 270, in main
train_stats = train_one_epoch(
File "/data0/zhiyong/code/github/mae/engine_pretrain.py", line 48, in train_one_epoch
loss, _, _ = model(samples, mask_ratio=args.mask_ratio)
File "/home/zhiyongzhang/anaconda3/envs/mae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data0/zhiyong/code/github/mae/models_fft_2.py", line 641, in forward
latent, mask, ids_restore = self.forward_encoder(imgs, mask_ratio)
File "/data0/zhiyong/code/github/mae/models_fft_2.py", line 545, in forward_encoder
x_combined = blk(x_combined)
File "/home/zhiyongzhang/anaconda3/envs/mae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhiyongzhang/anaconda3/envs/mae/lib/python3.8/site-packages/timm/models/vision_transformer.py", line 165, in forward
x = x + self.drop_path1(self.ls1(self.attn(self.norm1(x))))
File "/home/zhiyongzhang/anaconda3/envs/mae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in call_impl
return forward_call(*input, **kwargs)
File "/home/zhiyongzhang/anaconda3/envs/mae/lib/python3.8/site-packages/timm/models/vision_transformer.py", line 99, in forward
attn = attn.softmax(dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 8.90 GiB (GPU 0; 79.21 GiB total capacity; 60.10 GiB already allocated; 7.09 GiB free; 60.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
My agrs:
python main_fft_pretrain.py --world_size 2 --batch_size 4 --model mae_vit_fft_base_patch16 --norm
pix_loss --mask_ratio 0.75 --epochs 800 --warmup_epochs 40 --blr 1.5e-4 --weight_decay 0.05 --data_path /data0/zhiyong/data/imagenetResize
The text was updated successfully, but these errors were encountered:
I am running the code on a single machine with an A100 80GB of GPU memory, and I encountered the following error:
Traceback (most recent call last):
File "main_fft_pretrain.py", line 302, in
main(args)
File "main_fft_pretrain.py", line 270, in main
train_stats = train_one_epoch(
File "/data0/zhiyong/code/github/mae/engine_pretrain.py", line 48, in train_one_epoch
loss, _, _ = model(samples, mask_ratio=args.mask_ratio)
File "/home/zhiyongzhang/anaconda3/envs/mae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/data0/zhiyong/code/github/mae/models_fft_2.py", line 641, in forward
latent, mask, ids_restore = self.forward_encoder(imgs, mask_ratio)
File "/data0/zhiyong/code/github/mae/models_fft_2.py", line 545, in forward_encoder
x_combined = blk(x_combined)
File "/home/zhiyongzhang/anaconda3/envs/mae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhiyongzhang/anaconda3/envs/mae/lib/python3.8/site-packages/timm/models/vision_transformer.py", line 165, in forward
x = x + self.drop_path1(self.ls1(self.attn(self.norm1(x))))
File "/home/zhiyongzhang/anaconda3/envs/mae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in call_impl
return forward_call(*input, **kwargs)
File "/home/zhiyongzhang/anaconda3/envs/mae/lib/python3.8/site-packages/timm/models/vision_transformer.py", line 99, in forward
attn = attn.softmax(dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 8.90 GiB (GPU 0; 79.21 GiB total capacity; 60.10 GiB already allocated; 7.09 GiB free; 60.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
My agrs:
python main_fft_pretrain.py --world_size 2 --batch_size 4 --model mae_vit_fft_base_patch16 --norm
pix_loss --mask_ratio 0.75 --epochs 800 --warmup_epochs 40 --blr 1.5e-4 --weight_decay 0.05 --data_path /data0/zhiyong/data/imagenetResize
The text was updated successfully, but these errors were encountered: