Dyanmic Quantization #15

bfineran · 2024-04-17T13:16:17Z

This PR adds support for dynamic quantization. In dynamic quantization, scales and zero points are calculated on the fly at inference time for a particular tensor. This tradeoff with the extra compute gives us better results since the quantization params can fit the tensor directly rather than needing to be calibrated before hand.=

test_plan:
unit test included

Satrat

LGTM pending unit tests

Satrat · 2024-04-22T18:09:47Z

src/compressed_tensors/quantization/lifecycle/frozen.py

+    for submodule_name, submodule in module.named_modules():
        if "." not in submodule_name and submodule_name.endswith("_observer"):
-            # delete any observers that belong directly to this module
+            if getattr(submodule, "DYNAMIC", False):


nit: is there a reason we can't use submodule.DYNAMIC here instead of hardcoding?

Satrat · 2024-04-22T18:14:55Z

Actually one more thought here, we should probably make sure "dynamic" or "static" make their way into the quantization config on export so its clear what the format is when reloading into vLLM. maybe we need to add an additional flag to the config for this?

src/compressed_tensors/quantization/observers/base.py

Satrat

LGTM, just made a note to update the docstring for QuantizationArgs, Also, lets get in some unit tests before we merge this

src/compressed_tensors/quantization/quant_args.py

* [WIP] Dyanmic Quantization * update imports post rename * update dynamic bool * move dynamic control to Quant Args * Apply suggestions from code review * docstring and test

bfineran requested review from Satrat and horheynm April 17, 2024 13:16

bfineran self-assigned this Apr 17, 2024

bfineran force-pushed the dynamic-quant branch from fbb58d2 to 2395833 Compare April 17, 2024 13:40

Satrat reviewed Apr 22, 2024

View reviewed changes

Benjamin added 4 commits April 25, 2024 10:15

[WIP] Dyanmic Quantization

70f2f25

update imports post rename

089a997

update dynamic bool

5082aad

move dynamic control to Quant Args

7ca861b

bfineran force-pushed the dynamic-quant branch from 66be4ba to 7ca861b Compare April 25, 2024 14:24

bfineran marked this pull request as ready for review April 25, 2024 14:24

bfineran changed the title ~~[WIP] Dyanmic Quantization~~ Dyanmic Quantization Apr 25, 2024

bfineran commented Apr 25, 2024

View reviewed changes

src/compressed_tensors/quantization/observers/base.py Outdated Show resolved Hide resolved

Apply suggestions from code review

4733283

Satrat reviewed Apr 25, 2024

View reviewed changes

src/compressed_tensors/quantization/quant_args.py Show resolved Hide resolved

docstring and test

fd65b8d

Satrat approved these changes Apr 25, 2024

View reviewed changes

bfineran merged commit d707c5b into main Apr 25, 2024

bfineran deleted the dynamic-quant branch April 25, 2024 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dyanmic Quantization #15

Dyanmic Quantization #15

Uh oh!

bfineran commented Apr 17, 2024 •

edited

Loading

Uh oh!

Satrat left a comment

Uh oh!

Satrat Apr 22, 2024

Uh oh!

Satrat commented Apr 22, 2024

Uh oh!

Uh oh!

Satrat left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dyanmic Quantization #15

Dyanmic Quantization #15

Uh oh!

Conversation

bfineran commented Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Satrat left a comment

Choose a reason for hiding this comment

Uh oh!

Satrat Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

Satrat commented Apr 22, 2024

Uh oh!

Uh oh!

Satrat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bfineran commented Apr 17, 2024 •

edited

Loading