Feat: Optimize() validations across TRT, VLLM, Neuron container optimizations #4927

gwang111 · 2024-11-13T21:30:40Z

Issue #, if available:

Description of changes:

Testing done:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

I have read the CONTRIBUTING doc
I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
I used the commit message format described in CONTRIBUTING
I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
I have checked that my tests are not configured for a specific region or account (if appropriate)
I have used unique_name_from_base to create resource names in integ tests (if appropriate)
If adding any dependency in requirements.txt files, I have spell checked and ensured they exist in PyPi

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

gwang111 · 2024-11-13T23:43:25Z

src/sagemaker/serve/builder/model_builder.py

@@ -1157,6 +1161,15 @@ def optimize(
            Model: A deployable ``Model`` object.
        """

+        # TODO: ideally these dictionaries need to be sagemaker_core shapes


TODO: Ideally validations should happen at the start of the flow, not after everything is set up in the wrapper. Lets fast follow this and gradually shift the validations to helper functions in the outer scope here

Lokiiiiii · 2024-11-14T20:07:15Z

src/sagemaker/serve/validations/optimization.py

+logger = logging.getLogger(__name__)
+
+
+class OptimizationContainer(Enum):


We can use these for internal organization. But we should not burden customers with knowing how our optimizations work, which containers are used under which scenarios, etc.

Consider the following error messages -

Optimizations that use Compilation and Speculative Decoding are not currently supported on GPU & Neuron instances.

Optimizations that use Compilation and Quantization are not currently supported on Neuron instances.

Optimizations that use Quantization:SmoothQuant are only supported with Compilation on GPU instances.

Sharding is mutually exclusive and only supported on GPU instances.

The general theme we want to maintain is

Optimizations {A,B,C} are {not|only} supported with optimizations {X,Y,Z} on instances {L,M,N}

Ashish Gupta and others added 11 commits July 8, 2021 10:22

fix default time for compilation jobs

2e82eeb

Merge branch 'master' into master

65eb64e

Merge branch 'aws:master' into master

132fb94

changes for blackbird - model sharding

b19b4a3

remove UTs for now

5833143

Merge branch 'aws:master' into master

7949468

add unit tests

e40fad7

changes for blackbird - model sharding

63cfdf4

Merge branch 'master' into blackbird2

4431dd8

add more tests

65f4cc3

fix sharded model flag

741d0a6

gwang111 requested a review from Lokiiiiii November 13, 2024 21:30

gwang111 requested a review from a team as a code owner November 13, 2024 21:30

gwang111 requested a review from mufaddal-rohawala November 13, 2024 21:30

gwang111 temporarily deployed to auto-approve November 13, 2024 21:30 — with GitHub Actions Inactive

gwang111 removed the request for review from mufaddal-rohawala November 13, 2024 21:30

gwang111 temporarily deployed to auto-approve November 13, 2024 21:37 — with GitHub Actions Inactive

gwang111 force-pushed the garywan-blackbird branch from 49bd078 to bf55587 Compare November 13, 2024 21:56

gwang111 temporarily deployed to auto-approve November 13, 2024 21:57 — with GitHub Actions Inactive

gwang111 commented Nov 13, 2024

View reviewed changes

Captainia and others added 3 commits November 14, 2024 00:42

Revert "change: add TGI 2.4.0 image uri (aws#4922)" (aws#4926)

a3cb444

changes for blackbird - model sharding

37e26f2

add optimization validations

bb4a718

Lokiiiiii suggested changes Nov 14, 2024

View reviewed changes

fix formatting and msging

3d04384

gwang111 force-pushed the garywan-blackbird branch from bf55587 to 3d04384 Compare November 15, 2024 17:19

gwang111 temporarily deployed to auto-approve November 15, 2024 17:20 — with GitHub Actions Inactive

fixing validation bugs

22fdc37

gwang111 temporarily deployed to auto-approve November 15, 2024 23:32 — with GitHub Actions Inactive

gwang111 temporarily deployed to auto-approve November 15, 2024 23:37 — with GitHub Actions Inactive

add UTs

57123c9

gwang111 force-pushed the garywan-blackbird branch from f1d3649 to 57123c9 Compare November 16, 2024 00:28

gwang111 temporarily deployed to auto-approve November 16, 2024 00:28 — with GitHub Actions Inactive

simplify logic

d1074eb

gwang111 force-pushed the garywan-blackbird branch from a4dba03 to d1074eb Compare November 16, 2024 00:38

gwang111 temporarily deployed to auto-approve November 16, 2024 00:38 — with GitHub Actions Inactive

gwang111 added 3 commits November 16, 2024 02:15

update messaging

74a0e36

formatting

955479a

fix UTs

76a4102

gwang111 force-pushed the garywan-blackbird branch from ccad6cd to 76a4102 Compare November 16, 2024 05:49

gwang111 temporarily deployed to auto-approve November 16, 2024 05:49 — with GitHub Actions Inactive

add more UTs

b7b8d3c

gwang111 deployed to auto-approve November 16, 2024 05:55 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Optimize() validations across TRT, VLLM, Neuron container optimizations #4927

Feat: Optimize() validations across TRT, VLLM, Neuron container optimizations #4927

gwang111 commented Nov 13, 2024

gwang111 Nov 13, 2024

Lokiiiiii Nov 14, 2024 •

edited

Loading

		logger = logging.getLogger(__name__)


		class OptimizationContainer(Enum):

Feat: Optimize() validations across TRT, VLLM, Neuron container optimizations #4927

Are you sure you want to change the base?

Feat: Optimize() validations across TRT, VLLM, Neuron container optimizations #4927

Conversation

gwang111 commented Nov 13, 2024

Merge Checklist

General

Tests

gwang111 Nov 13, 2024

Choose a reason for hiding this comment

Lokiiiiii Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Lokiiiiii Nov 14, 2024 •

edited

Loading