TP llama with continuous batching #2709

mreso · 2023-10-12T22:49:49Z

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

pytest test/pytest/test_tp_llama.py

=========================================================================================================================== test session starts ============================================================================================================================
platform linux -- Python 3.10.12, pytest-7.3.1, pluggy-1.3.0
rootdir: /home/ubuntu/serve
plugins: mock-3.10.0, cov-4.1.0
collected 8 items

test/pytest/test_tp_llama.py ........                                                                                                                                                                                                                                [100%]

============================================================================================================================= warnings summary =============================================================================================================================
test/pytest/test_tp_llama.py::test_handler
  /home/ubuntu/serve/ts/torch_handler/base_handler.py:13: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    from pkg_resources import packaging

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================================= 8 passed, 1 warning in 62.65s (0:01:02) ==================================================================================================================

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?

codecov · 2023-10-12T23:06:00Z

Codecov Report

Merging #2709 (97d5a32) into master (7f4419f) will decrease coverage by 0.08%.
The diff coverage is n/a.

❗ Current head 97d5a32 differs from pull request most recent head b7230ac. Consider uploading reports for the commit b7230ac to get more accurate results

@@            Coverage Diff             @@
##           master    #2709      +/-   ##
==========================================
- Coverage   72.44%   72.36%   -0.08%     
==========================================
  Files          85       85              
  Lines        3963     3963              
  Branches       58       58              
==========================================
- Hits         2871     2868       -3     
- Misses       1088     1091       +3     
  Partials        4        4

see 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

…gths

HamidShojanazeri

thanks @mreso , LGTM, and test ran fine too.

pytest test/pytest/test_tp_llama.py::test_continuous_batching_tp_llama
================================================================================= test session starts ==================================================================================
platform linux -- Python 3.10.13, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/ubuntu/serve
collected 1 item                                                                                                                                                                       

test/pytest/test_tp_llama.py .                                                                                                                                                   [100%]

================================================================================== 1 passed in 26.15s ==================================================================================

mreso added 3 commits October 12, 2023 22:06

Enabled left side padding in llama2 model

cdc01ed

Add unit test for tp_llama

5d27b13

Use no_grad context manager

8022495

mreso added 16 commits October 13, 2023 04:46

[WIP] Converting llama_tp into cont batching handler

bc4375c

[WIP]Implement prefill and decode for tp_llama

ece426f

WIP fixing decode

a3f4b38

Fix return prefill vs decode format in tp_llama handler

c405e0c

Fix current_position index in llama_handler

430fe2e

Fix kv caching issue with different results for different padding len…

85966dc

…gths

Fix liniting error

246bb94

Remove cuda dependency for tp_llama test

9174b55

fix handler mock

c377b81

Adjust expected result for 13b model in tp llama test

5bce1ee

Make continuous batching work with tp llama

17fd172

Adjust sample txt

6e57063

Add missing requirements

92590db

Add support for chat dialogs

be59945

Merge branch 'master' into feature/tp_llama_with_continuous_batching

eb934a6

Merge branch 'master' into feature/tp_llama_with_continuous_batching

b7230ac

mreso marked this pull request as ready for review October 30, 2023 18:24

mreso changed the title ~~[WIP]TP llama with continuous batching~~ TP llama with continuous batching Oct 30, 2023

mreso requested a review from HamidShojanazeri October 30, 2023 18:24

HamidShojanazeri and others added 2 commits December 11, 2023 11:21

Merge branch 'master' into feature/tp_llama_with_continuous_batching

a22a37f

Use model archiver config in tp_llama test

df362b7

HamidShojanazeri approved these changes Dec 14, 2023

View reviewed changes

HamidShojanazeri added this pull request to the merge queue Dec 14, 2023

Merged via the queue into master with commit df94a56 Dec 14, 2023
13 checks passed

chauhang added this to the v0.10.0 milestone Feb 27, 2024

lxning mentioned this pull request Feb 28, 2024

[WIP]Llama2 streaming #2705

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TP llama with continuous batching #2709

TP llama with continuous batching #2709

mreso commented Oct 12, 2023 •

edited

Loading

codecov bot commented Oct 12, 2023 •

edited

Loading

HamidShojanazeri left a comment

TP llama with continuous batching #2709

TP llama with continuous batching #2709

Conversation

mreso commented Oct 12, 2023 • edited Loading

Description

Type of change

Feature/Issue validation/testing

Checklist:

codecov bot commented Oct 12, 2023 • edited Loading

Codecov Report

HamidShojanazeri left a comment

Choose a reason for hiding this comment

mreso commented Oct 12, 2023 •

edited

Loading

codecov bot commented Oct 12, 2023 •

edited

Loading