-
Notifications
You must be signed in to change notification settings - Fork 15
[Bugfix] Update expected shape for per token strategy #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are you seeing the failures?
The only relevant case is dynamic per token, which shouldn't be initializing anything as the scales are determined on the fly - so I dont think this change is required.
@dsikka There is a per-token non-dynamic test case in the tests. I discovered this bug while implementing #193, which uses In general, the initialized shape should be the same as the shape computed by |
Yeah I agree that the shape is correct. |
@dsikka From my reading, itlooks like it's just (1, ) https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py#L121-L124 This is my understanding of the different expected shapes
I haven't explored block and token quantization much, but the (1, ) shape for block quantization seems suspicious to me. Maybe @rahul-tuli you might be familiar? |
This is helpful, maybe should include in docs somewhere. Hard to navigate to here |
), | ||
], | ||
) | ||
def test_initialize_quantization_parameters(weights, input_activations): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general question - for output activation we dont need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could test output activation, but I decided not to for test simplicity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also make sure we're mocking correctly since the test case you pointed out applies the config to initialize scales/zp (which would be impacted by your change) but the mock doesn't seem to care about the shape
https://github.com/neuralmagic/compressed-tensors/blob/main/tests/conftest.py
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Background
Changes
tests/test_quantization/test_configs/test_strategies.py
Testing
tests/test_quantization/lifecycle/test_initialize.py