Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix][Kernel]Fix incorrect output tokens when running ChatGLM-9b inferencing on MI250 #312

Closed
wants to merge 1 commit into from

Conversation

jundali2
Copy link

@jundali2 jundali2 commented Dec 9, 2024

[Issue Description]
We encountered random and incorrect output tokens when running ChatGLM-9b inferencing with latest rocm/vllm release on MI250.

[Root Cause]
The vectorized rms_norm_kernel uses the same data type as input tensors (like FP16) to store intermediate results, which causes precision lost and leads to incorrect output tokens.

[Solution]
Use float type to store intermediate results and get correct output tokens.
We compared the outputs from different kernels w/ PyTorch's standard op and verified the final output token correctness, below are the test results:

  Mismatched elements Greatest absolute difference Greatest relative difference Output Tokens Speed(us)
Non-Vectorized Kernel 1040 / 16777216 (0.0%) 0.000488281 0.001631737 Correct 28.9972
Original Vectorized Kernel 16283524 / 16777216 (97.1%) 0.001953125 0.004055023 In-correct 28.8724
Vectorized Kernel w/ Fix 3840603 / 16777216 (22.9%) 0.000488281 0.000976563 Correct 28.7410

We can see after the fix is applied, without impacting on performance, the "mismatched elements" has reduced significantly, and both the absolute & relative differences are close or even better than non-vectorized kernel, final output tokens are correct as well.

@jundali2 jundali2 closed this by deleting the head repository Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant