Replies: 2 comments 1 reply
-
Speed improvements are always being worked on, but it may also be worth discussing your general setup. What models and schemas do you tend to work with? There might be some places we can suggest performance improvements. |
Beta Was this translation helpful? Give feedback.
-
model: qwen2-7b I would like to ask if there is an optimized solution? And what is the maximum number of seconds you can achieve in this case? |
Beta Was this translation helpful? Give feedback.
-
Now I use outlines+vllm and it takes 10 seconds to generate a JSON. This speed is not feasible in my current business. Is there any accelerated solution, such as a paper, plan or PR?
Beta Was this translation helpful? Give feedback.
All reactions