-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Memory allocation left resident in GPU(s) after model upload to HuggingFace #736
Comments
It seems that the memory gets freed mostly after the whole process finished successfully (this may take a bit, though). What is left is still a small footprint though, that we should ideally remove, too. We do see the same when using models in the build-in chat tool. I think the cleanest solution would be to run this in a subprocess as we do with model training. This ensures a clean environment even when the subprocess fails at some point. We might also consider to merge LoRA back automatically at the end of each experiment.
|
🐛 Bug
When uploading a model to HuggingFace and using the
cpu_shard
setting, and I believe any available GPUs, allocations are left resident in GPU memory after upload. This usually means I have to restart H2O LLM Studio so I can train another model, especially if I expect to be tight on memory.To Reproduce
Upload any model to HuggingFace using the
cpu_shard
setting. After finished, check nvidia-smi. See below after I uploaded a 22B param model:The text was updated successfully, but these errors were encountered: