-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Update expected shape for per token strategy #210
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are you seeing the failures?
The only relevant case is dynamic per token, which shouldn't be initializing anything as the scales are determined on the fly - so I dont think this change is required.
@dsikka There is a per-token non-dynamic test case in the tests. I discovered this bug while implementing #193, which uses In general, the initialized shape should be the same as the shape computed by |
Yeah I agree that the shape is correct. |
@dsikka From my reading, itlooks like it's just (1, ) https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py#L121-L124 This is my understanding of the different expected shapes
I haven't explored block and token quantization much, but the (1, ) shape for block quantization seems suspicious to me. Maybe @rahul-tuli you might be familiar? |
This is helpful, maybe should include in docs somewhere. Hard to navigate to here |
), | ||
], | ||
) | ||
def test_initialize_quantization_parameters(weights, input_activations): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general question - for output activation we dont need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could test output activation, but I decided not to for test simplicity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also make sure we're mocking correctly since the test case you pointed out applies the config to initialize scales/zp (which would be impacted by your change) but the mock doesn't seem to care about the shape
https://github.com/neuralmagic/compressed-tensors/blob/main/tests/conftest.py
Background
Changes
tests/test_quantization/test_configs/test_strategies.py
Testing
tests/test_quantization/lifecycle/test_initialize.py