[Feature]: Chunked prefill + lora #4995

rkooo567 · 2024-05-23T01:12:17Z

🚀 The feature, motivation and pitch

Currently lora doesn't work with chunked prefill because some of lora index logic doesn't cover the case where sampling is not required. This also means lora is not working with sampling_params do_sample=True.

We need to add test cases for these. WIP #4994

Alternatives

No response

Additional context

No response

rohithkrn · 2024-06-18T18:31:47Z

@rkooo567 can you share an example to reproduce this issue?

rkooo567 · 2024-06-19T01:18:10Z

I think you can simply create a test case by adding chunked prefill to any lora correctness test!

rkooo567 · 2024-06-19T01:18:19Z

https://github.com/vllm-project/vllm/tree/main/tests/lora

rohithkrn · 2024-06-19T20:02:14Z

@rkooo567 actually, when I run tests/lora/test_llama.py it passed. However, when I run examples/multilora_inference.py with chunked prefill the results are not matching results without chunked prefill. So, want to make sure we are talking about the same issue, I am trying to look into this on my side as well.

rohithkrn · 2024-06-20T00:43:31Z

@rkooo567 also are you seeing garbage output or an error?

sfc-gh-zhwang · 2024-09-18T06:44:42Z

you mean This also means lora is not working with sampling_params do_sample=False.? @rkooo567

rkooo567 · 2024-09-18T16:50:50Z

@rkooo567 actually, when I run tests/lora/test_llama.py it passed. However, when I run examples/multilora_inference.py with chunked prefill the results are not matching results without chunked prefill. So, want to make sure we are talking about the same issue, I am trying to look into this on my side as well.

Hi, I just this. I think the loral + chunked prefill now is basically broken because lora assumes some index mapping that only works with default scheduling policy. I think the side effect could be wrong output or crash

rkooo567 · 2024-09-18T16:51:04Z

you mean This also means lora is not working with sampling_params do_sample=False.? @rkooo567

Yes! that's right

Nero10578 · 2024-11-12T23:29:12Z

Will chunked prefill ever work with LORA?

sfc-gh-zhwang · 2024-11-12T23:31:40Z

see #9057 @Nero10578

mces89 · 2024-12-03T00:10:20Z

@sfc-gh-zhwang is there any plan when this feature can be merged?

rkooo567 added the feature request label May 23, 2024

rkooo567 self-assigned this May 23, 2024

rkooo567 mentioned this issue Jun 14, 2024

[misc] Do not allow to use lora with chunked prefill. #5538

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Chunked prefill + lora #4995

[Feature]: Chunked prefill + lora #4995

rkooo567 commented May 23, 2024

rohithkrn commented Jun 18, 2024

rkooo567 commented Jun 19, 2024

rkooo567 commented Jun 19, 2024

rohithkrn commented Jun 19, 2024

rohithkrn commented Jun 20, 2024

sfc-gh-zhwang commented Sep 18, 2024 •

edited

Loading

rkooo567 commented Sep 18, 2024

rkooo567 commented Sep 18, 2024

Nero10578 commented Nov 12, 2024

sfc-gh-zhwang commented Nov 12, 2024

mces89 commented Dec 3, 2024

[Feature]: Chunked prefill + lora #4995

[Feature]: Chunked prefill + lora #4995

Comments

rkooo567 commented May 23, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

rohithkrn commented Jun 18, 2024

rkooo567 commented Jun 19, 2024

rkooo567 commented Jun 19, 2024

rohithkrn commented Jun 19, 2024

rohithkrn commented Jun 20, 2024

sfc-gh-zhwang commented Sep 18, 2024 • edited Loading

rkooo567 commented Sep 18, 2024

rkooo567 commented Sep 18, 2024

Nero10578 commented Nov 12, 2024

sfc-gh-zhwang commented Nov 12, 2024

mces89 commented Dec 3, 2024

sfc-gh-zhwang commented Sep 18, 2024 •

edited

Loading