WIP: LoRA Adapters #304

nstogner · 2024-11-04T14:40:15Z

Addresses #132

alpe · 2024-11-05T09:21:25Z

Nice drawing. This is very helpful! 🌻
I am not super familiar with LoRa adapters but they can have significant size from what I saw. Caching seems a good idea. For the non-cache scenario, I would suggest to have no-cache or container-managed profile so that it does not look like the default to skip this.
With on-demand LoRa, the disk size may become a problem at some point. This is off scope but purge job or retention time can be things that need to be configured at some point in the profile.

samos123 · 2024-11-05T15:42:53Z

Can you show an example that has the url field? I'm assuming the url field must be used to specify the base model?

nstogner · 2024-11-05T15:46:32Z

@samos123 I currently have all examples in the diagrams

samos123 · 2024-11-05T16:07:00Z

That's where I looked but none of them have the base model URL set?

nstogner · 2024-11-08T02:08:53Z

Model .spec.url would be the same as normal.

nstogner · 2024-11-08T02:09:20Z

Note, it looks like vLLM supports loading adapters from huggingface: vllm-project/vllm#6234

nstogner · 2024-11-08T15:52:04Z

Note, vLLM has an endpoint to support dynamic loading/unloading of adapters: vllm-project/vllm#6566

charts/kubeai/templates/configmap.yaml

samos123 · 2024-11-11T19:02:58Z

hack/dev-models/gke-vllm-gpu-adapters.yaml

+  #url: hf://meta-llama/Llama-2-7b
+  adapters:
+  - id: test
+    url: hf://jashing/tinyllama-colorist-lora


does vLLM support directly loading this adapter from HF or is it a hard requirement to download the lora adapter first?

vLLM can load it from HF but not S3

See: https://github.com/vllm-project/vllm/blob/d1c6799b8870e513bf4f2305cbf6cda9fc3d773b/vllm/lora/utils.py#L178

LoRA Adapters

84939b7

nstogner requested a review from samos123 November 4, 2024 14:40

Update diagram

d3c483f

nstogner added 3 commits November 5, 2024 09:19

Update diagram with more details on caching

b416367

Add direct loading implementation

48f6d27

Update diagram

73ac57e

nstogner added 2 commits November 5, 2024 20:54

Checkin

fd9f1e6

Checkpoint

2dba855

nstogner added 2 commits November 8, 2024 21:23

Checkpoint: vllm adapter loader api working

ac1cc42

Update direct loading diagram

cfb07cd

samos123 reviewed Nov 11, 2024

View reviewed changes

charts/kubeai/templates/configmap.yaml Show resolved Hide resolved

samos123 reviewed Nov 11, 2024

View reviewed changes

nstogner added 7 commits November 12, 2024 07:35

Switch eph container to sidecar

33fa72e

Update endpoint lookup and GH action for building image

056b109

Parse adapters from request

0c4f845

Add tests

416b064

Fix non-deterministic test

71e1551

Add integration test case for listing model adapters

2f81826

Add validation for adapters

fd1a385

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: LoRA Adapters #304

WIP: LoRA Adapters #304

nstogner commented Nov 4, 2024

alpe commented Nov 5, 2024

samos123 commented Nov 5, 2024

nstogner commented Nov 5, 2024

samos123 commented Nov 5, 2024

nstogner commented Nov 8, 2024

nstogner commented Nov 8, 2024

nstogner commented Nov 8, 2024

samos123 Nov 11, 2024

nstogner Nov 12, 2024

nstogner Nov 12, 2024

WIP: LoRA Adapters #304

Are you sure you want to change the base?

WIP: LoRA Adapters #304

Conversation

nstogner commented Nov 4, 2024

alpe commented Nov 5, 2024

samos123 commented Nov 5, 2024

nstogner commented Nov 5, 2024

samos123 commented Nov 5, 2024

nstogner commented Nov 8, 2024

nstogner commented Nov 8, 2024

nstogner commented Nov 8, 2024

samos123 Nov 11, 2024

Choose a reason for hiding this comment

nstogner Nov 12, 2024

Choose a reason for hiding this comment

nstogner Nov 12, 2024

Choose a reason for hiding this comment