Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Use vllm chat object #2659

Merged
merged 7 commits into from
Jan 17, 2025
Merged

[python] Use vllm chat object #2659

merged 7 commits into from
Jan 17, 2025

Conversation

xyang16
Copy link
Contributor

@xyang16 xyang16 commented Jan 10, 2025

Description

Brief description of what this PR is about

  • If this change is a backward incompatible change, why must this change be made?
  • Interesting edge cases to note here

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist:

  • Please add the link of Integration Tests Executor run with related tests.
  • Have you manually built the docker image and verify the change?
  • Have you run related tests? Check how to set up the test environment here; One example would be pytest tests.py -k "TestCorrectnessLmiDist" -m "lmi_dist"
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A
    Logs for Test A

  • Test B
    Logs for Test B

@xyang16 xyang16 requested review from zachgk and a team as code owners January 10, 2025 02:33
from PIL.Image import Image
from typing import Optional
from pydantic import Field
from vllm.entrypoints.openai.protocol import ChatCompletionRequest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ok in the LMI and Neuron containers (we should validate on the neuron side that this does work), but I don't know if it will work in the trtllm container since we don't install vllm there.

We should either install vllm in the trtllm container, or retain the old messages format for trtllm.

Copy link
Contributor Author

@xyang16 xyang16 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some testing, I think we need to retain the old format for trtllm container.

@xyang16 xyang16 force-pushed the chat branch 9 times, most recently from 8aacf25 to 10741f4 Compare January 16, 2025 19:52
Comment on lines 144 to 146
if type(kwargs.get("rolling_batch")).__name__ in [
"LmiDistRollingBatch", "VLLMRollingBatch"
]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to base this choice of the config option.rolling_batch=x?

Copy link
Contributor Author

@xyang16 xyang16 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

option.rolling_batch may be auto, this will be lmi-dist or trtllm depends on which container it is. So it's hard to tell which rolling batch it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could set a config within the RB class like use_vllm_chat_completions? I think I would prefer that since i'm not sure whether using VllmRollingBatch with Neuron (a valid use case) supports some of the utilities we are using from vllm since we're pulling those neuron's vllm repo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Added use_vllm_chat_completions()

@siddvenk
Copy link
Contributor

@xyang16 i added 2 small changes to this PR

  1. fix the max_new_tokens issue in the integ test client
  2. pass all the parsed mm data rather than just images to the request

@xyang16 xyang16 merged commit ab53670 into deepjavalibrary:master Jan 17, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants