Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dyanmic Quantization #15

Merged
merged 6 commits into from
Apr 25, 2024
Merged

Dyanmic Quantization #15

merged 6 commits into from
Apr 25, 2024

Conversation

bfineran
Copy link
Contributor

@bfineran bfineran commented Apr 17, 2024

This PR adds support for dynamic quantization. In dynamic quantization, scales and zero points are calculated on the fly at inference time for a particular tensor. This tradeoff with the extra compute gives us better results since the quantization params can fit the tensor directly rather than needing to be calibrated before hand.=

test_plan:
unit test included

@bfineran bfineran requested review from Satrat and horheynm April 17, 2024 13:16
@bfineran bfineran self-assigned this Apr 17, 2024
Copy link

@Satrat Satrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending unit tests

if "." not in submodule_name and submodule_name.endswith("_observer"):
# delete any observers that belong directly to this module
if getattr(submodule, "DYNAMIC", False):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is there a reason we can't use submodule.DYNAMIC here instead of hardcoding?

@Satrat
Copy link

Satrat commented Apr 22, 2024

Actually one more thought here, we should probably make sure "dynamic" or "static" make their way into the quantization config on export so its clear what the format is when reloading into vLLM. maybe we need to add an additional flag to the config for this?

@bfineran bfineran marked this pull request as ready for review April 25, 2024 14:24
@bfineran bfineran changed the title [WIP] Dyanmic Quantization Dyanmic Quantization Apr 25, 2024
Copy link

@Satrat Satrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just made a note to update the docstring for QuantizationArgs, Also, lets get in some unit tests before we merge this

@bfineran bfineran merged commit d707c5b into main Apr 25, 2024
2 checks passed
@bfineran bfineran deleted the dynamic-quant branch April 25, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants