Skip to content

Commit

Permalink
Fixed GPU based vLLM path references
Browse files Browse the repository at this point in the history
  • Loading branch information
bbalakriz committed Jul 8, 2024
1 parent 1c51c26 commit e2ff9ff
Showing 1 changed file with 5 additions and 6 deletions.
11 changes: 5 additions & 6 deletions documentation/training_and_learning/genai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,24 +47,23 @@ You can skip to testing the vLLM after this step
2. Create PVC, deployment, & service in __vllm__ namespace.

#### PVC:
`oc apply -f https://raw.githubusercontent.com/rh-aiservices-bu/llm-on-openshift/main/llm-servers/vllm/gitops/pvc.yaml`
`oc apply -f https://raw.githubusercontent.com/rh-aiservices-bu/llm-on-openshift/main/llm-servers/vllm/gpu/gitops/pvc.yaml`

#### Service:
`oc apply -f https://raw.githubusercontent.com/rh-aiservices-bu/llm-on-openshift/main/llm-servers/vllm/gitops/service.yaml`
`oc apply -f https://raw.githubusercontent.com/rh-aiservices-bu/llm-on-openshift/main/llm-servers/vllm/gpu/gitops/service.yaml`

#### Route:
`oc apply -f https://raw.githubusercontent.com/rh-aiservices-bu/llm-on-openshift/main/llm-servers/vllm/gitops/route.yaml`
`oc apply -f https://raw.githubusercontent.com/rh-aiservices-bu/llm-on-openshift/main/llm-servers/vllm/gpu/gitops/route.yaml`

#### Deployment:

For the deployment, we'll need to edit the __env variable: HUGGING_FACE_HUB_TOKEN__ with your hugging face token.

We need this use this token because the model we're using, _Mistral-7B-Instruct-v0.2_, is gated. You must be authenticated to access it. Navigate to https://huggingface.co/settings/tokens and create a new token and copy it for the environment variable.

![mistralhf](./images/mistralhf_login.png "mistralhf")


Download the __deployment.yaml__: https://github.com/rh-aiservices-bu/llm-on-openshift/blob/main/llm-servers/vllm/gitops/deployment.yaml#L52 and add your token.
Download the __deployment.yaml__: https://github.com/rh-aiservices-bu/llm-on-openshift/blob/main/llm-servers/vllm/gpu/gitops/deployment.yaml#L52 and add your token.

![env hf token](./images/hf_hub_token.png "env hf token")

Expand Down Expand Up @@ -213,4 +212,4 @@ It then goes through asking a question which then invokes the LLM with the infor

![rag results](./images/rag_results.png "rag results")

As you can see above, the answer that the LLM gives is more specific to creating a Data Connection in RHOAI because it's uses the rhoai documents that are in the vector db.
As you can see above, the answer that the LLM gives is more specific to creating a Data Connection in RHOAI because it's uses the rhoai documents that are in the vector db.

0 comments on commit e2ff9ff

Please sign in to comment.