Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: how do I pass in the JSON content-type for ASYNC Mistral 7B offline inference #7908

Open
1 task done
fatihyildiz-cs opened this issue Aug 27, 2024 · 3 comments
Open
1 task done
Labels
stale usage How to use vllm

Comments

@fatihyildiz-cs
Copy link

Your current environment

PyTorch version: 2.2.2
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.5 (x86_64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.6 (v3.11.6:8b6ee5ba3b, Oct  2 2023, 11:18:21) [Clang 13.0.0 (clang-1300.0.29.30)] (64-bit runtime)
Python platform: macOS-14.5-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] sentence-transformers==2.2.2
[pip3] torch==2.2.2
[pip3] torchvision==0.17.2
[pip3] transformers==4.42.3
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

How would you like to use vllm

Similar to #7030 , I would like to use the JSON mode for Mistral 7B while doing offline inference using the generate method, but asynchronously. We would like to stream the response to our app and thought that we can use the Async Llm Engine to do that. Llm Class wraps the Sync LlmEngine and inserts the json schema thanks to a recent PR, but there is no wrapper class for the Asnyc Engine and hence we cannot supply a schema to it using the plain AsyncLLMEngine class. Is there a plan to integrate a wrapper (an AsyncLlm class) for the async engine? We appreciate any suggestions. Thank you

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@fatihyildiz-cs fatihyildiz-cs added the usage How to use vllm label Aug 27, 2024
@fatihyildiz-cs fatihyildiz-cs changed the title [Usage]: how do I pass in the JSON content-type for ASYNC Mistral 7B using offline inference [Usage]: how do I pass in the JSON content-type for ASYNC Mistral 7B offline inference Aug 27, 2024
@RoopeHakulinen
Copy link

I'd need this too. Would be great to understand if there's some way to make this happen with the current version 🙂

@fatihyildiz-cs
Copy link
Author

Still haven't found a solution for this. I'd appreciate any tips.

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale usage How to use vllm
Projects
None yet
Development

No branches or pull requests

2 participants