Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚗️ Implement distributed adapter cache #201

Closed
wants to merge 1 commit into from

Conversation

joerunde
Copy link
Contributor

@joerunde joerunde commented Jan 8, 2025

Description

vLLM currently does not support a distributed adapter cache- all replicas of a deployment must receive an explicit /v1/load_lora_adapter call to load an adapter.

This PR implements the existing ADAPTER_CACHE logic on top of vLLM's http server by injecting middleware that will detect if the model field of a request is referencing a set of files from the cache, and if so will pre-load the adapter before continuing with the call.

This also wraps the /v1/models endpoint to pre-load all adapters from the cache, so that the response is consistent across all replicas of a deployment.

Looking for some feedback here- I would like to solve this upstream but this gives us a quick way to roll out distributed lora adapters to users that matches existing TGIS behavior

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link
Contributor

@dtrifiro dtrifiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Planning on adding tests?

@dtrifiro dtrifiro force-pushed the adapter-experiment branch from bb557ae to 8ea6bfa Compare January 8, 2025 15:31
@joerunde
Copy link
Contributor Author

joerunde commented Jan 8, 2025

@dtrifiro yeah, first planning on talking over with our team and seeing if this is a direction we want to go for short term deliverables vs. opening an RFC upstream in vLLM to implement there first.

If we go with here, I'll add tests

@joerunde
Copy link
Contributor Author

joerunde commented Jan 8, 2025

Okay, team consensus is that this approach is reasonable, but we do not need to move fast to implement this in the adapter. I'll close this for now and use it as reference for a later RFC for vLLM

@joerunde joerunde closed this Jan 8, 2025
@dtrifiro dtrifiro deleted the adapter-experiment branch January 8, 2025 22:15
@dtrifiro dtrifiro restored the adapter-experiment branch January 8, 2025 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants