⚗️ Implement distributed adapter cache #201

joerunde · 2025-01-08T00:05:23Z

Description

vLLM currently does not support a distributed adapter cache- all replicas of a deployment must receive an explicit /v1/load_lora_adapter call to load an adapter.

This PR implements the existing ADAPTER_CACHE logic on top of vLLM's http server by injecting middleware that will detect if the model field of a request is referencing a set of files from the cache, and if so will pre-load the adapter before continuing with the call.

This also wraps the /v1/models endpoint to pre-load all adapters from the cache, so that the response is consistent across all replicas of a deployment.

Looking for some feedback here- I would like to solve this upstream but this gives us a quick way to roll out distributed lora adapters to users that matches existing TGIS behavior

How Has This Been Tested?

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

dtrifiro

Looks good. Planning on adding tests?

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2025-01-08T16:16:51Z

@dtrifiro yeah, first planning on talking over with our team and seeing if this is a direction we want to go for short term deliverables vs. opening an RFC upstream in vLLM to implement there first.

If we go with here, I'll add tests

joerunde · 2025-01-08T19:31:36Z

Okay, team consensus is that this approach is reasonable, but we do not need to move fast to implement this in the adapter. I'll close this for now and use it as reference for a later RFC for vLLM

dtrifiro approved these changes Jan 8, 2025

View reviewed changes

⚗️ Implement distributed adapter cache

8ea6bfa

Signed-off-by: Joe Runde <[email protected]>

dtrifiro force-pushed the adapter-experiment branch from bb557ae to 8ea6bfa Compare January 8, 2025 15:31

joerunde closed this Jan 8, 2025

dtrifiro deleted the adapter-experiment branch January 8, 2025 22:15

dtrifiro restored the adapter-experiment branch January 8, 2025 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚗️ Implement distributed adapter cache #201

⚗️ Implement distributed adapter cache #201

joerunde commented Jan 8, 2025

dtrifiro left a comment

joerunde commented Jan 8, 2025

joerunde commented Jan 8, 2025

⚗️ Implement distributed adapter cache #201

⚗️ Implement distributed adapter cache #201

Conversation

joerunde commented Jan 8, 2025

Description

How Has This Been Tested?

Merge criteria:

dtrifiro left a comment

Choose a reason for hiding this comment

joerunde commented Jan 8, 2025

joerunde commented Jan 8, 2025