tests_gaudi: Added L2 vllm workload #329

vbedida79 · 2024-10-31T16:09:08Z

PR includes gaudi l2 vllm workload

buildconfig based on https://github.com/opendatahub-io/vllm/blob/main-gaudi/Dockerfile.hpu.ubi
deployment including secret for hugging face token and pvc for model cache
Readme steps to build, deploy and verify

Signed-off-by: vbedida79 [email protected]

Signed-off-by: vbedida79 <[email protected]>

uMartinXu · 2024-11-01T17:10:03Z

tests/gaudi/l2/README.md

@@ -74,4 +74,83 @@ Welcome to HCCL demo
 [BENCHMARK]     NW Bandwidth   : 258.209121 GB/s
 [BENCHMARK]     Algo Bandwidth : 147.548069 GB/s
 ####################################################################################################
+```
+
+## VLLM 


uMartinXu · 2024-11-01T17:10:13Z

tests/gaudi/l2/README.md

+```
+
+## VLLM 
+VLLM is a serving engine for LLM's. The following workloads deploys a VLLM server with an LLM using Intel Gaudi. Refer to [Intel Gaudi VLLM fork](https://github.com/HabanaAI/vllm-fork.git) for more details.


uMartinXu · 2024-11-01T17:11:18Z

tests/gaudi/l2/README.md

+Build the workload container image:
+```
+$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/vllm_buildconfig.yaml
+```


Could we add the instruction to let user know whether the building is success. :-)

uMartinXu · 2024-11-01T17:12:47Z

tests/gaudi/l2/README.md

+```
+Deploy the workload:
+* Update the hugging face token and the pvc according to your cluster setup
+```


Could we have some detail about setting the hugging face token? and also give some brief introduction about what model we are using. :-)

uMartinXu · 2024-11-07T00:02:37Z

tests/gaudi/l2/vllm_buildconfig.yaml

+  runPolicy: "Serial"
+  source:
+    git:
+      uri: https://github.com/opendatahub-io/vllm.git


After comparing

https://github.com/opendatahub-io/vllm.git - ODH fork vllM

https://github.com/vllm-project/vllm - vLLM upstream
3.https://github.com/HabanaAI/vllm-fork - Habana fork vLLM
I think currently we should start from use the 3. with the change in 1 (adding the ubi based docker file for RH OpenShift), and obviously the Intel are upstreaming from 3 to 2. So in the long run we will using 2.
So I think we need to 1). submit a PR to adding the ubi based docker file for RH, and also add the RH 9.4 support into the documents, and then 2). using repo 3 3) I think the owner of 3 will also help to upstream the ubi based docker file and doc to 2. 4) after that we can switch to use 2 the upstream vLLM.
@vbedida79 any comments? :-)

tests_gaudi: Added L2 vllm workload

c7d75b9

Signed-off-by: vbedida79 <[email protected]>

vbedida79 requested a review from uMartinXu October 31, 2024 17:01

uMartinXu reviewed Nov 6, 2024

View reviewed changes

uMartinXu reviewed Nov 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests_gaudi: Added L2 vllm workload #329

tests_gaudi: Added L2 vllm workload #329

vbedida79 commented Oct 31, 2024

uMartinXu Nov 1, 2024

uMartinXu Nov 1, 2024

uMartinXu Nov 1, 2024

uMartinXu Nov 1, 2024

uMartinXu Nov 7, 2024

tests_gaudi: Added L2 vllm workload #329

Are you sure you want to change the base?

tests_gaudi: Added L2 vllm workload #329

Conversation

vbedida79 commented Oct 31, 2024

uMartinXu Nov 1, 2024

Choose a reason for hiding this comment

uMartinXu Nov 1, 2024

Choose a reason for hiding this comment

uMartinXu Nov 1, 2024

Choose a reason for hiding this comment

uMartinXu Nov 1, 2024

Choose a reason for hiding this comment

uMartinXu Nov 7, 2024

Choose a reason for hiding this comment