Skip to content

Conversation

bfineran
Copy link
Contributor

@bfineran bfineran commented Apr 17, 2024

This PR adds support for dynamic quantization. In dynamic quantization, scales and zero points are calculated on the fly at inference time for a particular tensor. This tradeoff with the extra compute gives us better results since the quantization params can fit the tensor directly rather than needing to be calibrated before hand.=

test_plan:
unit test included

@bfineran bfineran requested review from Satrat and horheynm April 17, 2024 13:16
@bfineran bfineran self-assigned this Apr 17, 2024
Copy link

@Satrat Satrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending unit tests

if "." not in submodule_name and submodule_name.endswith("_observer"):
# delete any observers that belong directly to this module
if getattr(submodule, "DYNAMIC", False):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is there a reason we can't use submodule.DYNAMIC here instead of hardcoding?

@Satrat
Copy link

Satrat commented Apr 22, 2024

Actually one more thought here, we should probably make sure "dynamic" or "static" make their way into the quantization config on export so its clear what the format is when reloading into vLLM. maybe we need to add an additional flag to the config for this?

@bfineran bfineran marked this pull request as ready for review April 25, 2024 14:24
@bfineran bfineran changed the title [WIP] Dyanmic Quantization Dyanmic Quantization Apr 25, 2024
Copy link

@Satrat Satrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just made a note to update the docstring for QuantizationArgs, Also, lets get in some unit tests before we merge this

@bfineran bfineran merged commit d707c5b into main Apr 25, 2024
@bfineran bfineran deleted the dynamic-quant branch April 25, 2024 18:24
Etelis added a commit to Etelis/compressed-tensors that referenced this pull request Sep 11, 2025
* [WIP] Dyanmic Quantization

* update imports post rename

* update dynamic bool

* move dynamic control to Quant Args

* Apply suggestions from code review

* docstring and test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants