Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU memory does not get freed up properly after each batch #108

Open
felix0097 opened this issue Mar 10, 2023 · 5 comments
Open

GPU memory does not get freed up properly after each batch #108

felix0097 opened this issue Mar 10, 2023 · 5 comments
Labels
bug Something isn't working P1
Milestone

Comments

@felix0097
Copy link

Describe the issue:

Dataloader accumulates GPU memory across batches if not manually calling gc.collect() after each batch or after every e.g every 5th batch. See example below, manually calling garbage collection saves around 7GiB in max GPU memory usage (11GiB vs 18GiB). Is there a way to free up GPU memory more reliable after each batch?

Minimal Complete Verifiable Example:

Create example data:

import pandas as pd
import numpy as np

n_samples = 20480

df = pd.DataFrame({
    'x': [np.random.uniform(size=(19357, )).astype('f4') for _ in range(n_samples)],
    'y': np.random.choice(range(100), size=n_samples).astype('i8')
})

df.to_parquet('test.parquet', row_group_size=1024, engine='pyarrow')

Check memory usage:

import merlin.io
from merlin.dataloader.torch import Loader
from merlin.schema import ColumnSchema, Schema

import gc
from pynvml import nvmlDeviceGetMemoryInfo, nvmlDeviceGetHandleByIndex


dataset = merlin.io.Dataset(
    'test.parquet', 
    engine='parquet', 
    part_size='180MB',
    schema=Schema([
        ColumnSchema(
            'x', dtype='float32', 
            is_list=True, is_ragged=False, 
            properties={'value_count': {'max': 19357}}
        ),
        ColumnSchema('y', dtype='int64')
    ])
)
print(dataset.partition_lens[:10])  # --> [2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048]


def benchmark(dataset, batch_size=4096, n_samples=1_000_000, call_gc=False):
    handle = nvmlDeviceGetHandleByIndex(0)
    max_memory = nvmlDeviceGetMemoryInfo(handle).used

    num_iter = n_samples // batch_size
    loader = Loader(dataset, batch_size=batch_size, shuffle=True, drop_last=True).epochs(100)

    for i, (batch, _) in enumerate(loader):
        x, y = batch['x'], batch['y']
        max_memory = max((max_memory, nvmlDeviceGetMemoryInfo(handle).used))
        if call_gc:
            gc.collect()
        if i == num_iter:
            break  

    loader.stop()
    gc.collect()

    return max_memory

Without manually calling garbage collection

max_mem = benchmark(dataset, batch_size=4096, n_samples=300_000, call_gc=False)
print('Max GPU memory usage:', max_mem // 1024**2 , 'MiB') # --> Gives: Max GPU memory usage: 18435 MiB

With manually calling garbage collection

max_mem = benchmark(dataset, batch_size=4096, n_samples=300_000, call_gc=True)
print('Max GPU memory usage:', max_mem // 1024**2 , 'MiB')  # --> Gives: Max GPU memory usage: 11305 MiB

Environment:

OS: Rocky Linux 8.7
Python: 3.10.9
merlin-core: 0.10.0
merlin-dataloader: 0.0.4
cudf-cu11: 23.02
rmm-cu11: 23.02
dask-cudf: 23.02

I installed both cudf + merlin via pip:
python -m pip install cudf-cu11==23.02 rmm-cu11==23.02 dask-cudf-cu11==23.02 --extra-index-url https://pypi.nvidia.com/
python -m pip install merlin-dataloader

@rnyak rnyak added bug Something isn't working P0 labels Jun 28, 2023
@rnyak rnyak added this to the Merlin 23.07 milestone Jun 28, 2023
@rnyak rnyak added P1 and removed P0 labels Jun 28, 2023
@felix0097
Copy link
Author

Hi @rnyak,

are there any updates on this issue? Thank you!

@edknv
Copy link
Contributor

edknv commented Oct 23, 2023

Could this be related to #76? It sounds like calling loader.stop() or better yet the context manager could help release the memory properly.

@felix0097
Copy link
Author

How does calling loader.stop() help with the memory consumption during training (while the loader gets consumed)? The problem isn't that the memory doesn't get properly released at the end of the training but rather during training.

@rnyak
Copy link

rnyak commented Oct 24, 2023

@felix0097 thanks for reporting that. we are not looking into that issue right now, due to some other tasks. So does manually calling garbage collection helps you?

@felix0097
Copy link
Author

yes, that solves the issue for me @rnyak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1
Projects
None yet
Development

No branches or pull requests

3 participants