You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2024-10-11 00:04:31,529 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory:
16.47 GiB -- Worker memory limit: 23.34 GiB
Even though it is just a warning, the execution freezes after this. I am running tinystories tutorial on 8 cpu workers. This happens after the clean_and_unify step of tinystories tutorial.
After freezing, I checked top and it still shows 8 active processes
Steps/Code to reproduce bug
I am trying the tinystories tutorial on the c4 realnewslike dataset.
The dataset is of size 37G. It contains 513 files each with 26953 entries. I don't have issues running this tutorial on the smaller version of the dataset (2G). Hence I think the warning is likely because of handling large datasets
Expected behavior
Expected it to finish the exection and write the processed data.
Environment overview (please complete the following information)
OS version -- Ubuntu 22.04.5 LTS (GNU/Linux 6.8.0-1015-aws x86_64)
Python version -- 3.10.15
pip version -- 24.2
dask version -- 2024.7.1
dask_cuda version -- 24.08.02
The text was updated successfully, but these errors were encountered:
This might be related to another issue we recently investigated where the memory usage went extremely high with 8 workers, but not with 4 workers. Ryan suspected some change on the RAPIDS side may have contributed to it.
This might be related to another issue we recently investigated where the memory usage went extremely high with 8 workers, but not with 4 workers. Ryan suspected some change on the RAPIDS side may have contributed to it.
Thanks. Given that the OOM's / hangs being discussed here are around CPU modules it seems unlikely that a Rapids change might have impacted results here. In either case @pappagari if you could try the same out with fewer number of workers it would be interesting to see if that works for your use case.
Describe the bug
The warning
Even though it is just a warning, the execution freezes after this. I am running tinystories tutorial on 8 cpu workers. This happens after the
clean_and_unify
step of tinystories tutorial.After freezing, I checked
top
and it still shows 8 active processesSteps/Code to reproduce bug
I am trying the tinystories tutorial on the c4 realnewslike dataset.
Download the dataset as follows (obtained from https://huggingface.co/datasets/allenai/c4)
The dataset is of size 37G. It contains 513 files each with 26953 entries. I don't have issues running this tutorial on the smaller version of the dataset (2G). Hence I think the warning is likely because of handling large datasets
Expected behavior
Expected it to finish the exection and write the processed data.
Environment overview (please complete the following information)
OS version -- Ubuntu 22.04.5 LTS (GNU/Linux 6.8.0-1015-aws x86_64)
Python version -- 3.10.15
pip version -- 24.2
dask version -- 2024.7.1
dask_cuda version -- 24.08.02
The text was updated successfully, but these errors were encountered: