torch.OutOfMemoryError: CUDA out of memory with Runpod A100 #160

nanaj96 · 2025-01-23T17:30:15Z

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 42.00 MiB. GPU 0 has a total capacity of 44.52 GiB of which 2.00 MiB is free. Process 280468 has 5.57 GiB memory in use. Process 280474 has 5.35 GiB memory in use. Process 280470 has 5.53 GiB memory in use. Process 280467 has 5.57 GiB memory in use. Process 280473 has 5.75 GiB memory in use. Process 280472 has 5.49 GiB memory in use. Process 280469 has 5.65 GiB memory in use. Process 280471 has 5.57 GiB memory in use. Of the allocated memory 4.92 GiB is allocated by PyTorch, and 134.41 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

FurkanGozukara · 2025-01-25T00:03:56Z

Diffusers with optimization works

I have tutorial : https://youtu.be/GjENQfHF4W8

nanaj96 · 2025-01-25T19:17:15Z

Diffusers with optimization works

I have tutorial : https://youtu.be/GjENQfHF4W8

is this multiple GPU training config?
Sana_1600M_img1024.yaml

FurkanGozukara · 2025-01-26T16:41:19Z

@nanaj96 if you are trying training that is different i replied for inference

nanaj96 · 2025-01-26T16:49:17Z

@nanaj96 if you are trying training that is different i replied for inference
yeah I'm trying training

nanaj96 · 2025-01-26T16:50:27Z

@FurkanGozukara do you know about this?

FurkanGozukara · 2025-01-26T21:29:10Z

@FurkanGozukara do you know about this?

i didnt train yet so dont know sorry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.OutOfMemoryError: CUDA out of memory with Runpod A100 #160

torch.OutOfMemoryError: CUDA out of memory with Runpod A100 #160

nanaj96 commented Jan 23, 2025

FurkanGozukara commented Jan 25, 2025

nanaj96 commented Jan 25, 2025

FurkanGozukara commented Jan 26, 2025

nanaj96 commented Jan 26, 2025

nanaj96 commented Jan 26, 2025 •

edited

Loading

FurkanGozukara commented Jan 26, 2025

torch.OutOfMemoryError: CUDA out of memory with Runpod A100 #160

torch.OutOfMemoryError: CUDA out of memory with Runpod A100 #160

Comments

nanaj96 commented Jan 23, 2025

FurkanGozukara commented Jan 25, 2025

nanaj96 commented Jan 25, 2025

FurkanGozukara commented Jan 26, 2025

nanaj96 commented Jan 26, 2025

nanaj96 commented Jan 26, 2025 • edited Loading

FurkanGozukara commented Jan 26, 2025

nanaj96 commented Jan 26, 2025 •

edited

Loading