-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use vLLM to load LLMs #230
Conversation
30cc874
to
b7e5606
Compare
84eadc3
to
6ed3584
Compare
Hey Nico! I did a first round of review and have some questions/comments:
|
@ad-astra-video thanks for the swift review ⚡:
Hi @kyriediculous, Would it be feasible to run the LLM pipeline in its own Docker container? We’re gradually migrating other pipelines into separate containers now that the orchestrator automatically downloads them (see PR #308 and PR #200). This setup would give you full control over dependencies and streamline reviews, as Brad wouldn’t need to test unrelated pipelines—especially since E2E tests similar to those in realtime aren’t implemented yet. Perhaps your new pip-compile setup already addresses this issue (I haven’t tested it yet). Additionally, we can merge PR #293, which enables pipeline container overrides. If you have any concerns about using a dedicated container, please let me know so I can better understand the constraints. Looking forward to hearing your thoughts! |
@ad-astra-video Do you also have the logs from the crash with 4 GPUs ? |
@rickstaa The only requirement would be creating a separate Dockerfile.llm in the |
|
EDIT: figured it out...the docker containers needs Original response below for reference:
|
This looks good with the couple small changes suggested above. The open api spec action is failing, can you run the api spec gen again? Will merge after that is passing. |
Vllm abd pytorch uses shared memory to efficiently share tensors between its dataloader workers and its main process. It defaults to 64mb but I found vlmm requires more. I forgot to mention it, maybe i should attach a small readme |
I see this as a documentation thing mostly so a note in the docs should cover this. Managed containers use default values for |
@ad-astra-video, to help move this forward, could you provide a concise list of the remaining changes you’re waiting on before this can be merged? Based on the discussion, here’s what I’ve noted so far:
Let me know if I missed anything or if there are additional updates to track! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the suggested changes. My apologies, i think i didnt get these posted.
@rickstaa @kyriediculous looks like i messed up posting the suggested changes. The ShmSize and IpcHost only come into play if running pipeline parallel or tensor size > 2 I believe. If Nico is comfortable with this we can move forward with it as a documentation note since these variables can be adjusted using external runners. Suggested changes:
|
I regenerated the bindings but there's no change to commit. |
@kyriediculous I confirmed fixes and tested runner locally. I am approving and merging and will re-run the open api gen separately to fix. |
This PR upgrades the LLM pipeline to use vLLM to load and perform inference on models using optimised batching and other features that come with vLLM.
Dependencies have been upgraded to be compatible with vLLM 0.6.3, these new dependency versions are untested with other pipelines (though could benefit them as well)
Both fp16 and 8 bit quantization is still supported, but could be further optimized by detecting GPUs on the machine and adjusting quantization methods to be used accordingly.
Docker file has been updated to use newer
pip
andtorch
Docker file has been udpated to respect
CUDA_PCI_BUS_ORDER
, ensuring the same develop experience as go-livepeer when specifying GPU id's found innvidia-smi
Adds
Top_P
andTop_K
parameters to the LLM routeChange API to take
messages
in common LLM format instead ofprompt
andhistory
fields