[Usage]: how do I pass in the JSON content-type for ASYNC Mistral 7B offline inference #7908

fatihyildiz-cs · 2024-08-27T12:20:58Z

Your current environment

PyTorch version: 2.2.2
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.5 (x86_64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.6 (v3.11.6:8b6ee5ba3b, Oct  2 2023, 11:18:21) [Clang 13.0.0 (clang-1300.0.29.30)] (64-bit runtime)
Python platform: macOS-14.5-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] sentence-transformers==2.2.2
[pip3] torch==2.2.2
[pip3] torchvision==0.17.2
[pip3] transformers==4.42.3
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

How would you like to use vllm

Similar to #7030 , I would like to use the JSON mode for Mistral 7B while doing offline inference using the generate method, but asynchronously. We would like to stream the response to our app and thought that we can use the Async Llm Engine to do that. Llm Class wraps the Sync LlmEngine and inserts the json schema thanks to a recent PR, but there is no wrapper class for the Asnyc Engine and hence we cannot supply a schema to it using the plain AsyncLLMEngine class. Is there a plan to integrate a wrapper (an AsyncLlm class) for the async engine? We appreciate any suggestions. Thank you

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

RoopeHakulinen · 2024-09-04T11:36:52Z

I'd need this too. Would be great to understand if there's some way to make this happen with the current version 🙂

fatihyildiz-cs · 2024-09-20T07:44:47Z

Still haven't found a solution for this. I'd appreciate any tips.

github-actions · 2024-12-20T02:00:27Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

fatihyildiz-cs added the usage How to use vllm label Aug 27, 2024

fatihyildiz-cs changed the title ~~[Usage]: how do I pass in the JSON content-type for ASYNC Mistral 7B using offline inference~~ [Usage]: how do I pass in the JSON content-type for ASYNC Mistral 7B offline inference Aug 27, 2024

github-actions bot added the stale label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: how do I pass in the JSON content-type for ASYNC Mistral 7B offline inference #7908

[Usage]: how do I pass in the JSON content-type for ASYNC Mistral 7B offline inference #7908

fatihyildiz-cs commented Aug 27, 2024

RoopeHakulinen commented Sep 4, 2024

fatihyildiz-cs commented Sep 20, 2024

github-actions bot commented Dec 20, 2024

[Usage]: how do I pass in the JSON content-type for ASYNC Mistral 7B offline inference #7908

[Usage]: how do I pass in the JSON content-type for ASYNC Mistral 7B offline inference #7908

Comments

fatihyildiz-cs commented Aug 27, 2024

Your current environment

How would you like to use vllm

Before submitting a new issue...

RoopeHakulinen commented Sep 4, 2024

fatihyildiz-cs commented Sep 20, 2024

github-actions bot commented Dec 20, 2024