Skip to content

[Bugfix] Update expected shape for per token strategy #210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 19, 2024

Conversation

kylesayrs
Copy link
Contributor

@kylesayrs kylesayrs commented Nov 19, 2024

Background

  • While implementing Accelerate Utilities #193, a bug was discovered where the shape of per-token scales is being initialized with the incorrect shape

Changes

  • Change shape of initialized quantization parameters in per token case
  • Unrelated, compare tuples rather than lists in tests/test_quantization/test_configs/test_strategies.py

Testing

  • Added new tests in tests/test_quantization/lifecycle/test_initialize.py

@kylesayrs kylesayrs mentioned this pull request Nov 19, 2024
@kylesayrs kylesayrs self-assigned this Nov 19, 2024
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are you seeing the failures?

The only relevant case is dynamic per token, which shouldn't be initializing anything as the scales are determined on the fly - so I dont think this change is required.

@kylesayrs
Copy link
Contributor Author

@dsikka There is a per-token non-dynamic test case in the tests.

I discovered this bug while implementing #193, which uses copy_ rather than out-of-place assignment when updating quantization parameters. Because copy_ requires the original shape (which used to be (1, )) to be the same as the newly updating shape (which is (1, 1)), an error was thrown.

In general, the initialized shape should be the same as the shape computed by q_params(). While I agree this bug only appears in special cases and that this change isn't necessary outside of the context of #193, I believe that it is correct.

@dsikka
Copy link
Collaborator

dsikka commented Nov 21, 2024

@dsikka There is a per-token non-dynamic test case in the tests.

I discovered this bug while implementing #193, which uses copy_ rather than out-of-place assignment when updating quantization parameters. Because copy_ requires the original shape (which used to be (1, )) to be the same as the newly updating shape (which is (1, 1)), an error was thrown.

In general, the initialized shape should be the same as the shape computed by q_params(). While I agree this bug only appears in special cases and that this change isn't necessary outside of the context of #193, I believe that it is correct.

Yeah I agree that the shape is correct.
What I'm thinking about now is when is the shape ever just 1. I guess in the static per tensor case but we actually update this in vLLM to be (1, 1) anyway (have to confirm if this is still the case) so we could make this the default potentially

rahul-tuli
rahul-tuli previously approved these changes Nov 21, 2024
@kylesayrs
Copy link
Contributor Author

@dsikka From my reading, itlooks like it's just (1, ) https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py#L121-L124

This is my understanding of the different expected shapes

Strategy Scale Shape ZP Shape
Tensor (1, ) (1, )
Channel (out_dims, 1) (out_dims, 1)
Group (in_dims, out_dims // group_size) (in_dims, out_dims // group_size)
Block (1, ) (1, )
Token (1, 1) (1, 1)

I haven't explored block and token quantization much, but the (1, ) shape for block quantization seems suspicious to me. Maybe @rahul-tuli you might be familiar?

@horheynm
Copy link
Contributor

@dsikka From my reading, itlooks like it's just (1, ) https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py#L121-L124

This is my understanding of the different expected shapes

Strategy Scale Shape ZP Shape
Tensor (1, ) (1, )
Channel (out_dims, 1) (out_dims, 1)
Group (in_dims, out_dims // group_size) (in_dims, out_dims // group_size)
Block (1, ) (1, )
Token (1, 1) (1, 1)
I haven't explored block and token quantization much, but the (1, ) shape for block quantization seems suspicious to me. Maybe @rahul-tuli you might be familiar?

This is helpful, maybe should include in docs somewhere. Hard to navigate to here

),
],
)
def test_initialize_quantization_parameters(weights, input_activations):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general question - for output activation we dont need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could test output activation, but I decided not to for test simplicity

horheynm
horheynm previously approved these changes Nov 21, 2024
@rahul-tuli rahul-tuli mentioned this pull request Nov 27, 2024
4 tasks
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also make sure we're mocking correctly since the test case you pointed out applies the config to initialize scales/zp (which would be impacted by your change) but the mock doesn't seem to care about the shape
https://github.com/neuralmagic/compressed-tensors/blob/main/tests/conftest.py

@kylesayrs kylesayrs dismissed stale reviews from horheynm and rahul-tuli via 5b43fd4 December 19, 2024 16:32
@dsikka dsikka merged commit 975cb22 into main Dec 19, 2024
1 check passed
@dsikka dsikka deleted the kylesayrs/fix-per-token-initialization branch December 19, 2024 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants