Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Chunked prefill + lora #4995

Open
rkooo567 opened this issue May 23, 2024 · 11 comments
Open

[Feature]: Chunked prefill + lora #4995

rkooo567 opened this issue May 23, 2024 · 11 comments
Assignees

Comments

@rkooo567
Copy link
Collaborator

🚀 The feature, motivation and pitch

Currently lora doesn't work with chunked prefill because some of lora index logic doesn't cover the case where sampling is not required. This also means lora is not working with sampling_params do_sample=True.

We need to add test cases for these. WIP #4994

Alternatives

No response

Additional context

No response

@rohithkrn
Copy link
Contributor

@rkooo567 can you share an example to reproduce this issue?

@rkooo567
Copy link
Collaborator Author

I think you can simply create a test case by adding chunked prefill to any lora correctness test!

@rkooo567
Copy link
Collaborator Author

@rohithkrn
Copy link
Contributor

@rkooo567 actually, when I run tests/lora/test_llama.py it passed. However, when I run examples/multilora_inference.py with chunked prefill the results are not matching results without chunked prefill. So, want to make sure we are talking about the same issue, I am trying to look into this on my side as well.

@rohithkrn
Copy link
Contributor

@rkooo567 also are you seeing garbage output or an error?

@sfc-gh-zhwang
Copy link
Contributor

sfc-gh-zhwang commented Sep 18, 2024

you mean This also means lora is not working with sampling_params do_sample=False.? @rkooo567

@rkooo567
Copy link
Collaborator Author

@rkooo567 actually, when I run tests/lora/test_llama.py it passed. However, when I run examples/multilora_inference.py with chunked prefill the results are not matching results without chunked prefill. So, want to make sure we are talking about the same issue, I am trying to look into this on my side as well.

Hi, I just this. I think the loral + chunked prefill now is basically broken because lora assumes some index mapping that only works with default scheduling policy. I think the side effect could be wrong output or crash

@rkooo567
Copy link
Collaborator Author

you mean This also means lora is not working with sampling_params do_sample=False.? @rkooo567

Yes! that's right

@Nero10578
Copy link

Will chunked prefill ever work with LORA?

@sfc-gh-zhwang
Copy link
Contributor

see #9057 @Nero10578

@mces89
Copy link

mces89 commented Dec 3, 2024

@sfc-gh-zhwang is there any plan when this feature can be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants