-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dyanmic Quantization #15
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending unit tests
if "." not in submodule_name and submodule_name.endswith("_observer"): | ||
# delete any observers that belong directly to this module | ||
if getattr(submodule, "DYNAMIC", False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is there a reason we can't use submodule.DYNAMIC here instead of hardcoding?
Actually one more thought here, we should probably make sure "dynamic" or "static" make their way into the quantization config on export so its clear what the format is when reloading into vLLM. maybe we need to add an additional flag to the config for this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just made a note to update the docstring for QuantizationArgs, Also, lets get in some unit tests before we merge this
This PR adds support for dynamic quantization. In dynamic quantization, scales and zero points are calculated on the fly at inference time for a particular tensor. This tradeoff with the extra compute gives us better results since the quantization params can fit the tensor directly rather than needing to be calibrated before hand.=
test_plan:
unit test included