Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load the qwen2_5_insturcut_7b model, because the deployment request response using sglang is slower than that of vllm deployment. When troubleshooting, print("mutates_args",mutates_args,"schema_str",schema_str) is this result reasonable? #695

Open
qingzhong1 opened this issue Dec 24, 2024 · 1 comment

Comments

@qingzhong1
Copy link

image
@yzh119
Copy link
Collaborator

yzh119 commented Dec 24, 2024

Sorry I don't quite understand the question.

because the deployment request response using sglang is slower than that of vllm deployment.

Do you mean warmup time, or evaluations metrics such as ITL or TTFT?

When troubleshooting, print("mutates_args",mutates_args,"schema_str",schema_str) is this result reasonable?

what's the purpose of printing these arguments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants