-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests_gaudi: Added L2 vllm workload #329
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: vbedida79 <[email protected]>
@@ -74,4 +74,83 @@ Welcome to HCCL demo | |||
[BENCHMARK] NW Bandwidth : 258.209121 GB/s | |||
[BENCHMARK] Algo Bandwidth : 147.548069 GB/s | |||
#################################################################################################### | |||
``` | |||
|
|||
## VLLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM
``` | ||
|
||
## VLLM | ||
VLLM is a serving engine for LLM's. The following workloads deploys a VLLM server with an LLM using Intel Gaudi. Refer to [Intel Gaudi VLLM fork](https://github.com/HabanaAI/vllm-fork.git) for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM
Build the workload container image: | ||
``` | ||
$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/vllm_buildconfig.yaml | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add the instruction to let user know whether the building is success. :-)
``` | ||
Deploy the workload: | ||
* Update the hugging face token and the pvc according to your cluster setup | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have some detail about setting the hugging face token? and also give some brief introduction about what model we are using. :-)
runPolicy: "Serial" | ||
source: | ||
git: | ||
uri: https://github.com/opendatahub-io/vllm.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After comparing
- https://github.com/opendatahub-io/vllm.git - ODH fork vllM
- https://github.com/vllm-project/vllm - vLLM upstream
3.https://github.com/HabanaAI/vllm-fork - Habana fork vLLM
I think currently we should start from use the 3. with the change in 1 (adding the ubi based docker file for RH OpenShift), and obviously the Intel are upstreaming from 3 to 2. So in the long run we will using 2.
So I think we need to 1). submit a PR to adding the ubi based docker file for RH, and also add the RH 9.4 support into the documents, and then 2). using repo 3 3) I think the owner of 3 will also help to upstream the ubi based docker file and doc to 2. 4) after that we can switch to use 2 the upstream vLLM.
@vbedida79 any comments? :-)
PR includes gaudi l2 vllm workload
Signed-off-by: vbedida79 [email protected]