You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyTorch version: 2.2.2
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 14.5 (x86_64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: Could not collect
Libc version: N/A
Python version: 3.11.6 (v3.11.6:8b6ee5ba3b, Oct 2 2023, 11:18:21) [Clang 13.0.0 (clang-1300.0.29.30)] (64-bit runtime)
Python platform: macOS-14.5-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] sentence-transformers==2.2.2
[pip3] torch==2.2.2
[pip3] torchvision==0.17.2
[pip3] transformers==4.42.3
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect
How would you like to use vllm
Similar to #7030 , I would like to use the JSON mode for Mistral 7B while doing offline inference using the generate method, but asynchronously. We would like to stream the response to our app and thought that we can use the Async Llm Engine to do that. Llm Class wraps the Sync LlmEngine and inserts the json schema thanks to a recent PR, but there is no wrapper class for the Asnyc Engine and hence we cannot supply a schema to it using the plain AsyncLLMEngine class. Is there a plan to integrate a wrapper (an AsyncLlm class) for the async engine? We appreciate any suggestions. Thank you
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
fatihyildiz-cs
changed the title
[Usage]: how do I pass in the JSON content-type for ASYNC Mistral 7B using offline inference
[Usage]: how do I pass in the JSON content-type for ASYNC Mistral 7B offline inference
Aug 27, 2024
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
How would you like to use vllm
Similar to #7030 , I would like to use the JSON mode for Mistral 7B while doing offline inference using the generate method, but asynchronously. We would like to stream the response to our app and thought that we can use the Async Llm Engine to do that. Llm Class wraps the Sync LlmEngine and inserts the json schema thanks to a recent PR, but there is no wrapper class for the Asnyc Engine and hence we cannot supply a schema to it using the plain
AsyncLLMEngine
class. Is there a plan to integrate a wrapper (anAsyncLlm
class) for the async engine? We appreciate any suggestions. Thank youBefore submitting a new issue...
The text was updated successfully, but these errors were encountered: