Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix too sensitive "Unsloth currently does not support multi GPU setups" when training with a single GPU in a multi-GPU environment. #1295

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

giuliabaldini
Copy link
Contributor

Hi there,

this PR has the changes requested in #974. I unfortunately don't have a system where I can test this myself, but I have been testing it with other people on a cluster that has multiple GPUs.

The only problem is that I think that the fix at llama.py:1694 does not seem to work, as we are still getting the error. So to make it run we have actually removed this check. Any ideas of how to fix that? Is it problematic to remove that check there?

@hife-ai @Datta0 @Sehyo

@danielhanchen
Copy link
Contributor

Will re-investigate this - apologies on the delay!

@Datta0
Copy link
Contributor

Datta0 commented Nov 17, 2024

Btw just thinking out loud (or thinking as written text)
Should we consolidate all these multi GPU errors into a single function? rn I see there's check_nvidia and the other part of code in from_pretrained.

@giuliabaldini
Copy link
Contributor Author

@Datta0, yeah, I definitely agree. However I am not incredibly familiar with patching functions this way, wouldn't the function have to be part of all the patched code, meaning that we have to rewrite it every time?

@Peter-Fy
Copy link

I tried deleting the check code in tokenizer_utils.py and llama.py, but I’m still getting the following error:

Traceback (most recent call last):

  File "/home/fdf/dpo_finetune.py", line 116, in <module>

    main()

  File "/home/fdf/dpo_finetune.py", line 108, in main

    trainer.train()

  File "<string>", line 40, in train

RuntimeError: Unsloth currently does not support multi GPU setups - but we are working on it!

However, I don’t know which line in unsloth triggered this error, so I can’t proceed to delete the check code further.

@Sehyo
Copy link

Sehyo commented Nov 19, 2024

Hi guys, I have been a bit busy. I can submit version with all the fixes on either thursday or friday, have a hectic schedule until then.

@giuliabaldini
Copy link
Contributor Author

Hi @Peter-Fy, did you try to install unsloth from this PR branch? Do you still get the error?

@Peter-Fy
Copy link

Hi @Peter-Fy, did you try to install unsloth from this PR branch? Do you still get the error?

Yes, I install unsloth from this PR branch, but I still get the error like:

Traceback (most recent call last):
  File "/home/fdf/qlora_finetune.py", line 133, in <module>
    main()
  File "/home/fdf/qlora_finetune.py", line 125, in main
    trainer.train()
  File "<string>", line 39, in train
RuntimeError: tokenizer_utils.py:971 Unsloth currently does not support multi GPU setups - but we are working on it!

So I delete the check code in tokenizer_utils.py:971, but I get another error like:

Traceback (most recent call last):
  File "/home/fdf/qlora_finetune.py", line 133, in <module>
    main()
  File "/home/fdf/qlora_finetune.py", line 125, in main
    trainer.train()
  File "<string>", line 40, in train
RuntimeError: Unsloth currently does not support multi GPU setups - but we are working on it!

@Peter-Fy
Copy link

Hi guys, I have been a bit busy. I can submit version with all the fixes on either thursday or friday, have a hectic schedule until then.

That will be helpful, looking forward to your fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants