Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data accumulation while computing in GPU #1094

Closed
ravikakaiya opened this issue Jun 16, 2022 · 1 comment
Closed

Data accumulation while computing in GPU #1094

ravikakaiya opened this issue Jun 16, 2022 · 1 comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed v0.8.x

Comments

@ravikakaiya
Copy link

🐛 Bug

To Reproduce

create a dataloader.
Passing batch wise images to GPU.
Computing SSIM on GPU.
After each batch the GPU memory allocation keeps on increasing.
Resulting into Cuda outof Memory

Expected behavior

RuntimeError: CUDA out of memory. Tried to allocate 210.00 MiB (GPU 0; 15.78 GiB total capacity; 14.39 GiB already allocated; 138.50 MiB free; 14.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Environment

TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): TorchMetrics version 0.8.2
Python & PyTorch Version (e.g., 1.0): python 3.9.12 , pytorch 1.11.0
Any other relevant information such as OS (e.g., Linux):OS : Linux

@ravikakaiya ravikakaiya added bug / fix Something isn't working help wanted Extra attention is needed labels Jun 16, 2022
@SkafteNicki
Copy link
Member

Hi @ravikakaiya,
This is warned about when using the metric: https://github.com/Lightning-AI/metrics/blob/203ab6b13cad0219b484f3e47c34b6e7c8831af1/src/torchmetrics/image/ssim.py#L86-L90
We need to store preds and target for computing over all batches.

If you only want to compute the value on the current batch, you could use the functional implementation
https://torchmetrics.readthedocs.io/en/stable/image/structural_similarity.html#functional-interface

Alternatively, you if you are using the modular implementation you can all metric.reset() after calling metric.compute() to reset the internal accumulation buffer.

Closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed v0.8.x
Projects
None yet
Development

No branches or pull requests

3 participants