-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Illegal memory access when fuse_reduction=False #10
Comments
Thank you very much for your feedback @tlrmchlsmth . I was unable to reproduce this bug using the latest commit (nm-vllm: e556f59 flux: c866c43). The command I ran is:
Could it be an environment-related issue? |
I change the sequence length to 512, still not be able to reproduce the bug. |
@zheng-ningxin Let's maybe wait for @tlrmchlsmth provide the docker to reproduce as mentioned in the other thread. |
I made a docker to repro the issue, but all tests pass there. I’ll keep you
posted.
…On Wed, Jul 17, 2024 at 11:13 PM Wenlei Bao ***@***.***> wrote:
@zheng-ningxin <https://github.com/zheng-ningxin> Let's maybe wait for
@tlrmchlsmth <https://github.com/tlrmchlsmth> provide the docker to
reproduce as mentioned in the other thread.
—
Reply to this email directly, view it on GitHub
<#10 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJN747YE377PC7EMQ3RR4DZM4XEFAVCNFSM6AAAAABKAOA5RKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZVGIZDMMRWGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I am no longer able to reproduce the issue at all on Flux's main. |
Describe the bug
I'm hitting an illegal memory access in vllm-project/vllm#5917 when setting fuse_reduction=False in the fused GEMM+ReduceScatter kernel.
To Reproduce
Clone vllm-project/vllm#5917 and then apply this patch:
Then run:
Unfortunately, I haven't been able to reproduce this with a minimal example. I also haven't been able to reproduce the problem when running with
compute-sanitizer
. Some problem sizes work, and some don't (--input-len 1024
seems to work OK but not--input-len 512
for instance).Stack trace/logs
The text was updated successfully, but these errors were encountered: