Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: AVReader memory leak #279

Open
rmira-sony opened this issue Sep 27, 2023 · 3 comments
Open

Bug: AVReader memory leak #279

rmira-sony opened this issue Sep 27, 2023 · 3 comments

Comments

@rmira-sony
Copy link

rmira-sony commented Sep 27, 2023

Hi,

Long-time user here, love the package, kudos to the contributors :)

I was trying out AVReader since I work with audiovisual data loaders. However, I've found that it leaks memory, aka using it rather than VideoReader results in climbing system memory usage until the program eventually crashes. To be clear, this does not happen with VideoReader using the same code on the same system.

Here's the wandb plot on system memory usage (pink is AVReader, orange is VideoReader):
image

To be more precise, the code for the pink line is:

vr = decord.AVReader(self.files[idx]["video"], sample_rate=self.cfg.sampling_rate, ctx=cpu(0))
wav, video = vr.get_batch(range(len(vr)))
wav = torch.cat(wav, dim=-1)

and the code for the orange line is:

vr = decord.VideoReader(self.files[idx]["video"], ctx=cpu(0))
video = vr.get_batch(range(len(vr)))
wav, old_sr = torchaudio.load(self.files[idx]["video"].replace(".mp4", ".wav"))
wav = torchaudio.functional.resample(wav, old_sr, self.cfg.sampling_rate)

Took me a while to diagnose this so hoping it can help solve this issue. Unfortunately, I'm not really familiar enough with the code to suggest a solution via pull request, so for now I'll stick to the VideoReader. Thanks for reading!

PS: This happens with workers>0 and also with workers=0, so the root is probably not related to the usual multiprocessing conflicts in pytorch dataloaders.
PPS: I'm using the latest version of decord, torch and torchaudio. I am using a standard torch dataset class and dataloader with 8 workers.

@v-iashin
Copy link

cool, thanks for reporting.

does the same thing happen if you use .VideoReader and .AudioReader separately?

@mwestbroek
Copy link

Hi, I am running into memory leaks using VideoReader, specifically using get_batch(). I am trying to multithread but the same happens singlethreaded.

tracemalloc is pointing to decord/_ffi/ndarray.py, specifically asnumpy():

np_arr = np.empty(shape, dtype=dtype)
assert np_arr.flags['C_CONTIGUOUS']
data = np_arr.ctypes.data_as(ctypes.c_void_p) 

The amount of memory used increments by the size of np_arr, so it seems as though the garbage collector is not doing what it needs to? I'm not sure. I'd be happy to help but got stuck on this.

@se7enXF
Copy link

se7enXF commented Oct 24, 2024

Hi, I am running into memory leaks using VideoReader, specifically using get_batch(). I am trying to multithread but the same happens singlethreaded.

tracemalloc is pointing to decord/_ffi/ndarray.py, specifically asnumpy():

np_arr = np.empty(shape, dtype=dtype)
assert np_arr.flags['C_CONTIGUOUS']
data = np_arr.ctypes.data_as(ctypes.c_void_p) 

The amount of memory used increments by the size of np_arr, so it seems as though the garbage collector is not doing what it needs to? I'm not sure. I'd be happy to help but got stuck on this.

I meet the same issue. #323

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants