-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] Use vllm chat object #2659
Conversation
from PIL.Image import Image | ||
from typing import Optional | ||
from pydantic import Field | ||
from vllm.entrypoints.openai.protocol import ChatCompletionRequest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be ok in the LMI and Neuron containers (we should validate on the neuron side that this does work), but I don't know if it will work in the trtllm container since we don't install vllm there.
We should either install vllm in the trtllm container, or retain the old messages format for trtllm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some testing, I think we need to retain the old format for trtllm container.
8aacf25
to
10741f4
Compare
if type(kwargs.get("rolling_batch")).__name__ in [ | ||
"LmiDistRollingBatch", "VLLMRollingBatch" | ||
]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to base this choice of the config option.rolling_batch=x
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
option.rolling_batch may be auto, this will be lmi-dist or trtllm depends on which container it is. So it's hard to tell which rolling batch it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could set a config within the RB class like use_vllm_chat_completions
? I think I would prefer that since i'm not sure whether using VllmRollingBatch with Neuron (a valid use case) supports some of the utilities we are using from vllm since we're pulling those neuron's vllm repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Added use_vllm_chat_completions()
@xyang16 i added 2 small changes to this PR
|
Description
Brief description of what this PR is about
Type of change
Please delete options that are not relevant.
Checklist:
pytest tests.py -k "TestCorrectnessLmiDist" -m "lmi_dist"
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test A
Logs for Test A
Test B
Logs for Test B