Load the qwen2_5_insturcut_7b model, because the deployment request response using sglang is slower than that of vllm deployment. When troubleshooting, print("mutates_args",mutates_args,"schema_str",schema_str) is this result reasonable? #695

qingzhong1 · 2024-12-24T06:35:42Z

yzh119 · 2024-12-24T06:43:45Z

Sorry I don't quite understand the question.

because the deployment request response using sglang is slower than that of vllm deployment.

Do you mean warmup time, or evaluations metrics such as ITL or TTFT?

When troubleshooting, print("mutates_args",mutates_args,"schema_str",schema_str) is this result reasonable?

what's the purpose of printing these arguments?

Provide feedback