Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: LoRA Adapters #304

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

WIP: LoRA Adapters #304

wants to merge 16 commits into from

Conversation

nstogner
Copy link
Contributor

@nstogner nstogner commented Nov 4, 2024

Addresses #132

@alpe
Copy link
Contributor

alpe commented Nov 5, 2024

Nice drawing. This is very helpful! 🌻
I am not super familiar with LoRa adapters but they can have significant size from what I saw. Caching seems a good idea. For the non-cache scenario, I would suggest to have no-cache or container-managed profile so that it does not look like the default to skip this.
With on-demand LoRa, the disk size may become a problem at some point. This is off scope but purge job or retention time can be things that need to be configured at some point in the profile.

@samos123
Copy link
Contributor

samos123 commented Nov 5, 2024

Can you show an example that has the url field? I'm assuming the url field must be used to specify the base model?

@nstogner
Copy link
Contributor Author

nstogner commented Nov 5, 2024

@samos123 I currently have all examples in the diagrams

@samos123
Copy link
Contributor

samos123 commented Nov 5, 2024

That's where I looked but none of them have the base model URL set?

@nstogner
Copy link
Contributor Author

nstogner commented Nov 8, 2024

Model .spec.url would be the same as normal.

@nstogner
Copy link
Contributor Author

nstogner commented Nov 8, 2024

Note, it looks like vLLM supports loading adapters from huggingface: vllm-project/vllm#6234

@nstogner
Copy link
Contributor Author

nstogner commented Nov 8, 2024

Note, vLLM has an endpoint to support dynamic loading/unloading of adapters: vllm-project/vllm#6566

#url: hf://meta-llama/Llama-2-7b
adapters:
- id: test
url: hf://jashing/tinyllama-colorist-lora
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does vLLM support directly loading this adapter from HF or is it a hard requirement to download the lora adapter first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vLLM can load it from HF but not S3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants