[python] Use vllm chat object #2659

xyang16 · 2025-01-10T02:33:31Z

Description

Brief description of what this PR is about

If this change is a backward incompatible change, why must this change be made?
Interesting edge cases to note here

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

Please add the link of Integration Tests Executor run with related tests.
Have you manually built the docker image and verify the change?
Have you run related tests? Check how to set up the test environment here; One example would be pytest tests.py -k "TestCorrectnessLmiDist" -m "lmi_dist"
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Logs for Test A
Test B
Logs for Test B

siddvenk · 2025-01-10T16:54:58Z

engines/python/setup/djl_python/chat_completions/chat_properties.py

-from PIL.Image import Image
+from typing import Optional
+from pydantic import Field
+from vllm.entrypoints.openai.protocol import ChatCompletionRequest


This should be ok in the LMI and Neuron containers (we should validate on the neuron side that this does work), but I don't know if it will work in the trtllm container since we don't install vllm there.

We should either install vllm in the trtllm container, or retain the old messages format for trtllm.

After some testing, I think we need to retain the old format for trtllm container.

siddvenk · 2025-01-16T23:30:27Z

engines/python/setup/djl_python/input_parser.py

+        if type(kwargs.get("rolling_batch")).__name__ in [
+                "LmiDistRollingBatch", "VLLMRollingBatch"
+        ]:


is it possible to base this choice of the config option.rolling_batch=x?

option.rolling_batch may be auto, this will be lmi-dist or trtllm depends on which container it is. So it's hard to tell which rolling batch it is.

Maybe we could set a config within the RB class like use_vllm_chat_completions? I think I would prefer that since i'm not sure whether using VllmRollingBatch with Neuron (a valid use case) supports some of the utilities we are using from vllm since we're pulling those neuron's vllm repo

Sounds good. Added use_vllm_chat_completions()

siddvenk · 2025-01-16T23:42:15Z

@xyang16 i added 2 small changes to this PR

fix the max_new_tokens issue in the integ test client
pass all the parsed mm data rather than just images to the request

xyang16 requested review from zachgk and a team as code owners January 10, 2025 02:33

siddvenk reviewed Jan 10, 2025

View reviewed changes

xyang16 force-pushed the chat branch 9 times, most recently from 8aacf25 to 10741f4 Compare January 16, 2025 19:52

xyang16 added 4 commits January 16, 2025 15:17

[python] Use vllm chat object

14c1d6d

Update

d1ca2b3

Update

eb9ff28

[python] Use vllm chat object

16a51fd

xyang16 force-pushed the chat branch from 10741f4 to 16a51fd Compare January 16, 2025 23:17

siddvenk added 2 commits January 16, 2025 15:29

pass all mm_data from parsed chat to request input

545293f

use max_tokens instead of max_new_tokens in test chat client

7fe1755

siddvenk reviewed Jan 16, 2025

View reviewed changes

Add use_vllm_chat_completions()

bc74b52

siddvenk approved these changes Jan 17, 2025

View reviewed changes

xyang16 merged commit ab53670 into deepjavalibrary:master Jan 17, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] Use vllm chat object #2659

[python] Use vllm chat object #2659

xyang16 commented Jan 10, 2025

siddvenk Jan 10, 2025

xyang16 Jan 15, 2025 •

edited

Loading

siddvenk Jan 16, 2025

xyang16 Jan 16, 2025 •

edited

Loading

siddvenk Jan 16, 2025

xyang16 Jan 16, 2025

siddvenk commented Jan 16, 2025

[python] Use vllm chat object #2659

[python] Use vllm chat object #2659

Conversation

xyang16 commented Jan 10, 2025

Description

Type of change

Checklist:

Feature/Issue validation/testing

siddvenk Jan 10, 2025

Choose a reason for hiding this comment

xyang16 Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

siddvenk Jan 16, 2025

Choose a reason for hiding this comment

xyang16 Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

siddvenk Jan 16, 2025

Choose a reason for hiding this comment

xyang16 Jan 16, 2025

Choose a reason for hiding this comment

siddvenk commented Jan 16, 2025

xyang16 Jan 15, 2025 •

edited

Loading

xyang16 Jan 16, 2025 •

edited

Loading