Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetuning in 2:4 sparsity w4a16 example fails with multiple GPUs #911

Open
arunpatala opened this issue Nov 13, 2024 · 4 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@arunpatala
Copy link

arunpatala commented Nov 13, 2024

Thanks for this nice repo.

Describe the bug
Finetuning in 2:4 sparsity w4a16 example fails with multiple GPUs

Expected behavior
The finetuning step expected to train successfully with multi GPUs

Environment
Using four NVIDIA A10 GPUs on aws notebook instance

To Reproduce
cd examples/quantization_2of4_sparse_w4a16
python llama7b_sparse_w4a16.py

Errors
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0!

Additional context

I have seen example of examples/finetuning/example_fsdp_config.yaml
But not sure how to use it for finetuning in the above example.

Also would like to know if there is any paper that discusses this approach?
Is the finetuning being done on the whole model and in what precision?
Will finetuning of quantized models further with lora adapters be supported?

Thanks
Arun

@arunpatala arunpatala added the bug Something isn't working label Nov 13, 2024
@dsikka
Copy link
Collaborator

dsikka commented Nov 13, 2024

Hi @arunpatala this is a known bug that we're tracking.

@arunpatala
Copy link
Author

thanks

@dsikka dsikka self-assigned this Nov 25, 2024
@dsikka
Copy link
Collaborator

dsikka commented Nov 26, 2024

Hi @arunpatala do you mind using the transformers off of main to check if the error persists?

@arunpatala
Copy link
Author

I dont have access to multi GPU machine right now. I will test it next time I have access. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants