Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add DeepSeek tutorial #128

Merged
merged 4 commits into from
Feb 3, 2025
Merged

docs: Add DeepSeek tutorial #128

merged 4 commits into from
Feb 3, 2025

Conversation

oandreeva-nv
Copy link
Contributor

No description provided.

```

The following steps should result in a `results.txt` that has the following content
```bash
Copy link
Contributor

@pranavm-nvidia pranavm-nvidia Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is just text and not bash

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjusted

Comment on lines 77 to 84
```bash
$ curl -X POST localhost:8000/v2/models/vllm_model/generate -d '{"text_input": "What is Triton Inference Server?", "parameters": {"stream": false, n"temperature": 0, "exclude_input_in_output": true, "max_tokens": 45}}' | jq
{
"model_name": "vllm_model",
"model_version": "1",
"text_output": " It's a high-performance, scalable, and efficient inference server for AI models. It's designed to handle large numbers of requests quickly and efficiently, making it suitable for real-time applications like autonomous vehicles, smart homes, and more"
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could we separate the bash command and the output log? I think this would be easier for users to quickly copy paste the command:

curl -X POST localhost:8000/v2/models/vllm_model/generate -d '{"text_input": "What is Triton Inference Server?", "parameters": {"stream": false, n"temperature": 0, "exclude_input_in_output": true, "max_tokens": 45}}' | jq

Then you should see the output

{
  "model_name": "vllm_model",
  "model_version": "1",
  "text_output": " It's a high-performance, scalable, and efficient inference server for AI models. It's designed to handle large numbers of requests quickly and efficiently, making it suitable for real-time applications like autonomous vehicles, smart homes, and more"
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjusted

@richardhuo-nv richardhuo-nv self-assigned this Jan 31, 2025
@rmccorm4 rmccorm4 changed the title DeepSeek tutorial docs: Add DeepSeek tutorial Jan 31, 2025
oandreeva-nv and others added 2 commits January 31, 2025 14:19
Review 1

Co-authored-by: pranavm-nvidia <[email protected]>
Co-authored-by: Kris Hung <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
Copy link
Contributor

@statiraju statiraju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a heading so that sphinx can render well when included with triton user guides.

@statiraju
Copy link
Contributor

Thanks for the quick guide @oandreeva-nv

@oandreeva-nv oandreeva-nv merged commit 92574e6 into main Feb 3, 2025
3 checks passed
@oandreeva-nv oandreeva-nv deleted the oandreeva_deepseek branch February 3, 2025 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants