-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does anyone else get CUDA out of memory during hyperparameter search? #311
Comments
Hello! Just intuitively, I wouldn't expect e1a5375 to have fixed this issue. I'm aware that others have experienced OOM issues with the hyperparameter search, but I don't think anyone has successfully debugged it so far. With other words, I suspect the issue still persists.
|
Hi @bogedy, can I ask you how did you exactly apply the suggestion in huggingface/transformers#13019? I'm running into the same
|
Want to share what versions of SetFit, optuna and pytorch and which base model you're using so I can try to reproduce? I had to edit the SetFit source code. It's in the second code block under "Updates to remedy the issue". Basically it's a hacky work around: the This error is common enough with Optuna that they have some documentation on it and an argument to run gc automatically https://optuna.readthedocs.io/en/stable/faq.html#how-do-i-avoid-running-out-of-memory-oom-when-optimizing-studies |
Still an active issue with optuna - not using huggingface, but just running a optuna hyperparameter optimization with big keras models is impossible because GPU memory allocator bugs out before a trial even begins, unless done on trivially small batch sizes. |
HI! I had the same problem and got a working version by rewriting the hyperparameter_search function following this issue: huggingface/transformers#13019 Just updated it according to the current state of the module:
|
Eventhough it is probably better if one just runs optuna with gc_after_trial=True |
I had this problem and I see that in the repo's hyperparameter notebook someone else had this problem too! https://github.com/huggingface/setfit/blob/main/notebooks/text-classification_hyperparameter-search.ipynb
I fixed it by following this advice here huggingface/transformers#13019
I wanted to make a pull request, but when I tried to reproduce the issue later (after pulling new changes) I couldn't. The memory use stayed constant over all the trials. Did e1a5375 fix this? I'm curious. Would love to supply a PR if its helpful but maybe it's fixed already.
The text was updated successfully, but these errors were encountered: