Skip to content

Commit

Permalink
Steps to use with vllm and triton server
Browse files Browse the repository at this point in the history
  • Loading branch information
sachinsshetty committed Mar 28, 2024
1 parent 46f808f commit a6bc12e
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 0 deletions.
37 changes: 37 additions & 0 deletions docs/triton-tensorRT-llm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Triton Server Setup

Build Triton
- git clone https://github.com/triton-inference-server/server
- cd server
- python build.py

Build Mixtral with Tensor RT-LLM

- git clone https://github.com/NVIDIA/TensorRT-LLM/
- cd TensorRT-LLM
- cd examples/mixtral
- pip install -r requirements
- git lfs install
- git clone https://huggingface.co/mistralai/Mixtral-8x7B-v0.1


- Build Triton
- https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/customization_guide/build.html
- Security
- https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/customization_guide/deploy.html
- https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/customization_guide/build.html#building-with-docker

- build mixtral
- https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/mixtral/README.md
- https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/getting_started/quickstart.html
- Build mistral - https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#mistral-v01


Extra
- sudo useradd -m llm
- sudo passwd llm
- sudo usermod -aG sudo llm
- apt install python3.10-venv
- apt install python3-pip
- sudo apt-get install build-essential linux-generic libmpich-dev libopenmpi-dev
- sudo apt install openmpi-devel
19 changes: 19 additions & 0 deletions docs/vllm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Setup with Vllm

- Creat account in huggingface > Profile > AccessToken > create new user Access token


docker run --gpus all \
-e HF_TOKEN=$HF_TOKEN -p 8000:8000 \
ghcr.io/mistralai/mistral-src/vllm:latest \
--host 0.0.0.0 \
--model mistralai/Mistral-7B-Instruct-v0.2

curl --location 'http://IP:Port/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "mistralai/Mistral-7B-Instruct-v0.2",
"messages": [
{"role": "user", "content": "what minimun materials are necessary to build a Seed harvesting robot, show me how to arrange the parts"}
]
}'

0 comments on commit a6bc12e

Please sign in to comment.