-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Chunked prefill + lora #4995
Comments
@rkooo567 can you share an example to reproduce this issue? |
I think you can simply create a test case by adding chunked prefill to any lora correctness test! |
@rkooo567 actually, when I run |
@rkooo567 also are you seeing garbage output or an error? |
you mean |
Hi, I just this. I think the loral + chunked prefill now is basically broken because lora assumes some index mapping that only works with default scheduling policy. I think the side effect could be wrong output or crash |
Yes! that's right |
Will chunked prefill ever work with LORA? |
see #9057 @Nero10578 |
@sfc-gh-zhwang is there any plan when this feature can be merged? |
🚀 The feature, motivation and pitch
Currently lora doesn't work with chunked prefill because some of lora index logic doesn't cover the case where sampling is not required. This also means lora is not working with sampling_params do_sample=True.
We need to add test cases for these. WIP #4994
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: