TensorRT-LLM backend bump to latest version + misc fixes #2791

mfuntowicz · 2024-12-01T23:16:12Z

This PR bumps some dependencies related to TensorRT-LLM alongside rebasing Docker container against ubuntu24.04 instead of ubuntu22.04.

To support this, we need to use latest TensorRT-LLM main due to a missing import in one of the file on Nvidia side leading to gcc failure for gcc > 12 (more info here).

It also reimplement the backbone of the backend for simplicity and alow easier testing which is actively WIP as for now but will land in its own PR as it will involve some additional changes on GA side.

…ormation from the engine folder

…tibility

…_tokens

…tps://doc.rust-lang.org/stable/std/pin/struct.Pin.html#method.as_mut

* chore: Add doc and CI for TRTLLM * chore: Add doc and CI for TRTLLM * chore: Add doc and CI for TRTLLM * chore: Add doc and CI for TRTLLM * doc: Formatting

HuggingFaceDocBuilderDev · 2024-12-11T07:45:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Hugoch

Let's gooo 🚀 !!!!!!

Hugoch · 2024-12-02T08:49:54Z

backends/trtllm/csrc/backend.cpp

+        // Define some configuration variables
+        executor_config.setKvCacheConfig(tle::KvCacheConfig(true));
+        executor_config.setEnableChunkedContext(compute_capabilities.is_at_least_ampere());
+        executor_config.setSchedulerConfig(tle::SchedulerConfig(tle::CapacitySchedulerPolicy::kMAX_UTILIZATION));


Probably something we would need to have as a variable somewhere (but we need to bench the impacts first).

What about we expose it through the CLI of the launcher mb? This way it's both CLI and ENV overridable

Yeah but let's make sure that it's an advanced setting (I don't users to be confused by all the CLI options)

backends/trtllm/csrc/backend.hpp

mfuntowicz requested a review from Hugoch December 1, 2024 23:18

mfuntowicz added 16 commits December 3, 2024 09:43

misc(cmake) update dependencies

0f17415

feat(hardware) enable new hardware.hpp and unittests

7a81040

test(ctest) enable address sanitizer

1830fe8

feat(backend): initial rewrite of the backend for simplicity

3a2698f

feat(backend): remove all the logs from hardware.hpp

6d35657

feat(backend): added some logging

9bb6309

feat(backend): enable compiler warning if support for RVO not applying

87272ff

feat(backend): missing return statement

702dc9c

feat(backend): introduce backend_workspace_t to store precomputed inf…

25c6bbe

…ormation from the engine folder

feat(backend): delete previous backend impl

df99164

feat(backend): more impl

fd7e2b5

feat(backend): use latest trtllm main version to have g++ >= 13 compa…

71e700a

…tibility

feat(backend): allow overriding which Python to use

879e1a4

feat(backend): fix backend_exception_t -> backend_error_t naming

a7bad25

feat(backend): impl missing generation_step_t as return value of pull…

2f8634e

…_tokens

feat(backend): make backend_workspace_t::engines_folder constexpr

874bc28

mfuntowicz force-pushed the trtllm/cancellation branch from 5476947 to 874bc28 Compare December 3, 2024 09:01

mfuntowicz added 12 commits December 3, 2024 12:11

feat(backend): fix main.rs retrieving the tokenizer

16ba2f5

feat(backend): add guard to multiple header definitions

c94b9de

test(backend): add more unittest

ad3ed0d

feat(backend): remove constexpr from par

881527a

feat(backend): remove constexpig

6253064

test(backend): more test coverage

cc6bc33

chore(trtllm): update dependency towards 0.15.0

b6dbf60

effectively cancel the request on the executor

460f290

feat(backend) fix moving backend when pulling

300f6c6

feat(backend): make sure we can easily cancel request on the executor

b3cd5ea

feat(backend): fix missing "0" field access

049f4ac

misc(backend): fix reborrowing Pin<&mut T> as described in the doc ht…

f0cd474

…tps://doc.rust-lang.org/stable/std/pin/struct.Pin.html#method.as_mut

chore: Add doc and CI for TRTLLM (#2799)

ab6591e

* chore: Add doc and CI for TRTLLM * chore: Add doc and CI for TRTLLM * chore: Add doc and CI for TRTLLM * chore: Add doc and CI for TRTLLM * doc: Formatting

mfuntowicz marked this pull request as ready for review December 11, 2024 09:28

mfuntowicz mentioned this pull request Dec 11, 2024

Feat/trtllm cancellation dev container #2795

Merged

Hugoch previously approved these changes Dec 13, 2024

View reviewed changes

misc(backend): indent

1640da7

mfuntowicz dismissed Hugoch’s stale review via 1640da7 December 13, 2024 14:38

mfuntowicz merged commit ea7f408 into main Dec 13, 2024
9 of 13 checks passed

mfuntowicz deleted the trtllm/cancellation branch December 13, 2024 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT-LLM backend bump to latest version + misc fixes #2791

TensorRT-LLM backend bump to latest version + misc fixes #2791

mfuntowicz commented Dec 1, 2024

HuggingFaceDocBuilderDev commented Dec 11, 2024

Hugoch left a comment

Hugoch Dec 2, 2024

mfuntowicz Dec 13, 2024

Hugoch Dec 13, 2024

TensorRT-LLM backend bump to latest version + misc fixes #2791

TensorRT-LLM backend bump to latest version + misc fixes #2791

Conversation

mfuntowicz commented Dec 1, 2024

HuggingFaceDocBuilderDev commented Dec 11, 2024

Hugoch left a comment

Choose a reason for hiding this comment

Hugoch Dec 2, 2024

Choose a reason for hiding this comment

mfuntowicz Dec 13, 2024

Choose a reason for hiding this comment

Hugoch Dec 13, 2024

Choose a reason for hiding this comment