-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative approach to support torch.compile #1006
Conversation
d9a29b1
to
73dc0cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs tests!
b67534b
to
4fd473a
Compare
4fd473a
to
d765704
Compare
self.max_clamp = max_int(module.is_signed, module.is_narrow_range, self.bit_width) | ||
|
||
def quantize(self, x): | ||
return torch.clamp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these won't work with Groupwise quantization, correct? So inference_mode
+ MX won't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I forgot to add the export handler for MX INT and MX Float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postponed to another update
LGTM! |
This works by assuming that most of the quantization process has already taken place, and it is no longer needed to propagate QuantTensors.
Typical usage: