-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100 #9838
[BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100 #9838
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
update, and small fix in causal_conv1d kernel Signed-off-by: mzusman <[email protected]>
f6947db
to
162361f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
Verified that pytest tests/kernels/test_causal_conv1d.py
was hitting an illegal memory access, and that's now fixed on an H100
if ((offset + kWidth - 2) >= kNElts){ | ||
// do not load to index 1 if we're not gonna read from there | ||
reinterpret_cast<vec_t *>(x_vals_load)[1] = smem_exchange[last_thread + 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, added more explanation
@mzusman, is
|
between 2 chunks Signed-off-by: mzusman <[email protected]>
Signed-off-by: mzusman <[email protected]>
I've dug into those 2 failures, and it seems to fail on a specific edge case where the final state data is split between the two iterations in the kernel (in the most of the cases the data ends up only on the last I guess it was overlooked previously because it fails on specific on sequence lengths that depend on the chunk size, while our tests generate sequence lengths randomly (for the varlen part, more specifically). |
Signed-off-by: mzusman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are green for me now, thanks for the additional bug fix!
…_h100 Signed-off-by: mzusman <[email protected]>
…llm-project#9838) Signed-off-by: mzusman <[email protected]>
…llm-project#9838) Signed-off-by: mzusman <[email protected]> Signed-off-by: Linkun Chen <[email protected]>
…llm-project#9838) Signed-off-by: mzusman <[email protected]>
…llm-project#9838) Signed-off-by: mzusman <[email protected]>
…llm-project#9838) Signed-off-by: mzusman <[email protected]> Signed-off-by: Loc Huynh <[email protected]>
…llm-project#9838) Signed-off-by: mzusman <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>
…llm-project#9838) Signed-off-by: mzusman <[email protected]>
…llm-project#9838) Signed-off-by: mzusman <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
…llm-project#9838) Signed-off-by: mzusman <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>
…llm-project#9838) Signed-off-by: mzusman <[email protected]>
test_selective_state_update_with_batch_indices
and match them to the constraints intest_selective_state_update_with_heads_with_batch_indices
- Without this change, it also fails in H100 as apparently the diff is a little higher there.CC @tlrmchlsmth