-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Add DeepSeek tutorial #128
Conversation
``` | ||
|
||
The following steps should result in a `results.txt` that has the following content | ||
```bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is just text and not bash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjusted
```bash | ||
$ curl -X POST localhost:8000/v2/models/vllm_model/generate -d '{"text_input": "What is Triton Inference Server?", "parameters": {"stream": false, n"temperature": 0, "exclude_input_in_output": true, "max_tokens": 45}}' | jq | ||
{ | ||
"model_name": "vllm_model", | ||
"model_version": "1", | ||
"text_output": " It's a high-performance, scalable, and efficient inference server for AI models. It's designed to handle large numbers of requests quickly and efficiently, making it suitable for real-time applications like autonomous vehicles, smart homes, and more" | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we separate the bash command and the output log? I think this would be easier for users to quickly copy paste the command:
curl -X POST localhost:8000/v2/models/vllm_model/generate -d '{"text_input": "What is Triton Inference Server?", "parameters": {"stream": false, n"temperature": 0, "exclude_input_in_output": true, "max_tokens": 45}}' | jq
Then you should see the output
{
"model_name": "vllm_model",
"model_version": "1",
"text_output": " It's a high-performance, scalable, and efficient inference server for AI models. It's designed to handle large numbers of requests quickly and efficiently, making it suitable for real-time applications like autonomous vehicles, smart homes, and more"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjusted
Review 1 Co-authored-by: pranavm-nvidia <[email protected]> Co-authored-by: Kris Hung <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a heading so that sphinx can render well when included with triton user guides.
Thanks for the quick guide @oandreeva-nv |
No description provided.