-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hqq JIT Quantization #147
hqq JIT Quantization #147
Conversation
@tgaddair could build an docker image or test it ? |
@flozi00 |
Its pretty slow Alternative for 2bit quant would be Quip#, but its not data free |
@tgaddair what do you think ? |
I'm fine closing this for now if the latency is prohibitive. I'm not familiar with Quip, but open to adding it in if it's useful. Curious if GPT-Q 3-bit would be worth exploring if we're looking for something lower than 4-bit? |
@flozi00 Maybe you want to avoid quantizing the lm head. Not sure if thats the bottleneck for performance. Also the quantization is propably going to fail |
did some more testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flozi00 I noticed that some of hqq's dependencies like functorch
try to install torch<2.1, which overrides our current version of torch. Is this what you're seeing in your environment as well?
from the hqq setup.py I will take a closer look to that, in my env i have not seen this |
Okay i checked the docker build logs |
https://github.com/predibase/lorax/actions/runs/7559682310/job/20584021297#step:10:1021 Seems to be coming from peft 0.4.0 requirement |
Interesting, let me try digging into the docker image and see if there are any package differences. If not, then it should be safe to merge. |
should be fixed now by setting torch to the latest version 2.1.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for addressing the PyTorch version issue.
What does this PR do?
As always not tested yet
Blind coding at the moment