-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix too sensitive "Unsloth currently does not support multi GPU setups" when training with a single GPU in a multi-GPU environment. #1295
base: main
Are you sure you want to change the base?
Conversation
Will re-investigate this - apologies on the delay! |
Btw just thinking out loud (or thinking as written text) |
@Datta0, yeah, I definitely agree. However I am not incredibly familiar with patching functions this way, wouldn't the function have to be part of all the patched code, meaning that we have to rewrite it every time? |
I tried deleting the check code in
However, I don’t know which line in |
Hi guys, I have been a bit busy. I can submit version with all the fixes on either thursday or friday, have a hectic schedule until then. |
Hi @Peter-Fy, did you try to install unsloth from this PR branch? Do you still get the error? |
Yes, I install unsloth from this PR branch, but I still get the error like: Traceback (most recent call last):
File "/home/fdf/qlora_finetune.py", line 133, in <module>
main()
File "/home/fdf/qlora_finetune.py", line 125, in main
trainer.train()
File "<string>", line 39, in train
RuntimeError: tokenizer_utils.py:971 Unsloth currently does not support multi GPU setups - but we are working on it! So I delete the check code in Traceback (most recent call last):
File "/home/fdf/qlora_finetune.py", line 133, in <module>
main()
File "/home/fdf/qlora_finetune.py", line 125, in main
trainer.train()
File "<string>", line 40, in train
RuntimeError: Unsloth currently does not support multi GPU setups - but we are working on it! |
That will be helpful, looking forward to your fixes. |
Hi there,
this PR has the changes requested in #974. I unfortunately don't have a system where I can test this myself, but I have been testing it with other people on a cluster that has multiple GPUs.
The only problem is that I think that the fix at llama.py:1694 does not seem to work, as we are still getting the error. So to make it run we have actually removed this check. Any ideas of how to fix that? Is it problematic to remove that check there?
@hife-ai @Datta0 @Sehyo