-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs Updates #61
base: main
Are you sure you want to change the base?
Docs Updates #61
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
- The `--target` flag specifies the server hosting the model. In this case, it is a local vLLM server. | ||
- The `--model` flag specifies the model to evaluate. The model name should match the name of the model deployed on the server | ||
- By default, GuideLLM will run a `sweep` of performance evaluations across different request rates, each lasting 120 seconds. The results will be saved to a local directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I would rename
flag
toparameter
since our CLI supports both: parameters and flags. If you specify a flag - there is no value next to it. If you specify parameter - the value is requied then. -
In some cases we may get an error if the tokenizer is not specified. I would add another item here. Text is below:
- The
--tokenizer
parameter specifies the tokenizer to encount the number of tokens in the dataset. If you faced any issues try using--tokenizer neuralmagic/Meta-Llama-3.1-8B-quantized.w8a8
.
Summary:
This pull request introduces the GuideLLM CLI guide, README enhancements, image uploads, and the supported backends documentation to highlight all the backends that can be used with GuideLLM.
Test Cases:
The GuideLLM CLI has been tested with various LLM models and backends.
Unit tests ensure core functionalities work as expected.
Documentation:
Created documentation detailing the GuideLLM CLI usage and output metrics.
Created documentation detailing the openai-compatible API/HTTP pathway for TGI, llama.cpp, and DeepSparse in supported_backends.md
Additional Information:
The pull request includes changes to the docs/guides directory for the CLI documentation.
Binary files containing performance summary visualizations are added to the docs/assets directory.
Please review and provide feedback.