TRT LLM Integration with LORA #3305

agunapal · 2024-09-06T18:46:45Z

Description

This PR integrates TRT-LLM with TorchServe

Move handler from examples to torch_handler
Update llama example to llama 3.1
Update to latest trt-llm
Add example for using LoRA model with trt-llm
Added support for trt-llm with llm_launcher

Fixes #(issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Logs for Test A
Test B
Logs for Test B

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

mreso

Good starting points, main concerns are that we're not leveraging async and git clone in llm_launcher

examples/large_models/trt_llm/lora/README.md

ts/llm_launcher.py

ts/utils/hf_utils.py

ts/llm_launcher.py

Co-authored-by: Matthias Reso <[email protected]>

… into feature/trt_llm_lora

mreso

Does not look like the async part is working. Was this tested with at least 2 requests simultaneously and are we getting two interleaved streams?

examples/large_models/trt_llm/lora/README.md

ts/torch_handler/trt_llm_handler.py

… into feature/trt_llm_lora

mreso

LGTM now

agunapal added 3 commits September 6, 2024 18:43

TRT LLM Integration with LORA

98b654f

TRT LLM Integration with LORA

ce59497

TRT LLM Integration with LORA

347159f

agunapal marked this pull request as ready for review September 6, 2024 18:58

agunapal requested a review from mreso September 6, 2024 18:58

agunapal added 4 commits September 6, 2024 19:28

TRT LLM Integration with LORA

06d5b47

Added launcher support for trt_llm

0ecc71a

updated README

56d0ccc

updated README

2d6f60d

mreso requested changes Sep 9, 2024

View reviewed changes

mreso and others added 8 commits September 10, 2024 09:34

Merge branch 'master' into feature/trt_llm_lora

e7a0275

Using the API that supports async generate

f4da9b6

Review comments

bcc036b

Apply suggestions from code review

cfc687d

Co-authored-by: Matthias Reso <[email protected]>

addressed review comments

f36f6a8

Merge branch 'feature/trt_llm_lora' of https://github.com/pytorch/serve…

457f451

… into feature/trt_llm_lora

Addressed review comments

cc71d93

Merge branch 'master' into feature/trt_llm_lora

bbbaf69

agunapal requested a review from mreso September 14, 2024 00:48

mreso requested changes Sep 16, 2024

View reviewed changes

examples/large_models/trt_llm/lora/README.md Outdated Show resolved Hide resolved

ts/torch_handler/trt_llm_handler.py Outdated Show resolved Hide resolved

agunapal and others added 5 commits September 17, 2024 01:46

Updated the async logic based on review comments

c986c7b

Merge branch 'feature/trt_llm_lora' of https://github.com/pytorch/serve…

30b92c6

… into feature/trt_llm_lora

Made max_batch_size and kv_cache size configurable for the launcher

26a1532

fixing lint

8260619

Merge branch 'master' into feature/trt_llm_lora

589130d

mreso approved these changes Sep 17, 2024

View reviewed changes

mreso added this pull request to the merge queue Sep 17, 2024

Merged via the queue into master with commit d5e10de Sep 17, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRT LLM Integration with LORA #3305

TRT LLM Integration with LORA #3305

agunapal commented Sep 6, 2024 •

edited

Loading

mreso left a comment

mreso left a comment

mreso left a comment

TRT LLM Integration with LORA #3305

TRT LLM Integration with LORA #3305

Conversation

agunapal commented Sep 6, 2024 • edited Loading

Description

Type of change

Feature/Issue validation/testing

Checklist:

mreso left a comment

Choose a reason for hiding this comment

mreso left a comment

Choose a reason for hiding this comment

mreso left a comment

Choose a reason for hiding this comment

agunapal commented Sep 6, 2024 •

edited

Loading