Steps to use with vllm and triton server

slabstech · Mar 28, 2024 · a6bc12e · a6bc12e
1 parent 46f808f
commit a6bc12e
Show file tree

Hide file tree

Showing 2 changed files with 56 additions and 0 deletions.
diff --git a/docs/triton-tensorRT-llm.md b/docs/triton-tensorRT-llm.md
@@ -0,0 +1,37 @@
+Triton Server Setup
+
+Build Triton
+    - git clone https://github.com/triton-inference-server/server
+    - cd server
+    - python build.py  
+
+Build Mixtral with Tensor RT-LLM
+
+    - git clone https://github.com/NVIDIA/TensorRT-LLM/
+    - cd TensorRT-LLM
+    - cd examples/mixtral
+    - pip install -r requirements
+    - git lfs install
+    - git clone https://huggingface.co/mistralai/Mixtral-8x7B-v0.1
+
+
+- Build Triton
+    - https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/customization_guide/build.html
+- Security
+    - https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/customization_guide/deploy.html
+    - https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/customization_guide/build.html#building-with-docker
+
+- build mixtral
+    - https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/mixtral/README.md
+- https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/getting_started/quickstart.html
+- Build mistral - https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#mistral-v01
+
+
+Extra 
+- sudo useradd -m llm
+- sudo passwd llm
+- sudo usermod -aG sudo llm
+ - apt install python3.10-venv
+ - apt install python3-pip
+ - sudo apt-get install build-essential linux-generic libmpich-dev libopenmpi-dev
+ - sudo apt install openmpi-devel
diff --git a/docs/vllm.md b/docs/vllm.md
@@ -0,0 +1,19 @@
+Setup with Vllm
+
+- Creat account in huggingface > Profile > AccessToken > create new user Access token
+
+
+docker run --gpus all \
+    -e HF_TOKEN=$HF_TOKEN -p 8000:8000 \
+    ghcr.io/mistralai/mistral-src/vllm:latest \
+    --host 0.0.0.0 \
+    --model mistralai/Mistral-7B-Instruct-v0.2
+
+curl --location 'http://IP:Port/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+        "model": "mistralai/Mistral-7B-Instruct-v0.2",
+        "messages": [
+            {"role": "user", "content": "what minimun materials are necessary to build a Seed harvesting robot, show me how to arrange the parts"}
+        ]
+    }'