-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Optimize() validations across TRT, VLLM, Neuron container optimizations #4927
base: master
Are you sure you want to change the base?
Conversation
49bd078
to
bf55587
Compare
@@ -1157,6 +1161,15 @@ def optimize( | |||
Model: A deployable ``Model`` object. | |||
""" | |||
|
|||
# TODO: ideally these dictionaries need to be sagemaker_core shapes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Ideally validations should happen at the start of the flow, not after everything is set up in the wrapper. Lets fast follow this and gradually shift the validations to helper functions in the outer scope here
logger = logging.getLogger(__name__) | ||
|
||
|
||
class OptimizationContainer(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use these for internal organization. But we should not burden customers with knowing how our optimizations work, which containers are used under which scenarios, etc.
Consider the following error messages -
- Optimizations that use Compilation and Speculative Decoding are not currently supported on GPU & Neuron instances.
- Optimizations that use Compilation and Quantization are not currently supported on Neuron instances.
- Optimizations that use Quantization:SmoothQuant are only supported with Compilation on GPU instances.
- Sharding is mutually exclusive and only supported on GPU instances.
The general theme we want to maintain is
Optimizations {A,B,C} are {not|only} supported with optimizations {X,Y,Z} on instances {L,M,N}
bf55587
to
3d04384
Compare
f1d3649
to
57123c9
Compare
a4dba03
to
d1074eb
Compare
ccad6cd
to
76a4102
Compare
Issue #, if available:
Description of changes:
Testing done:
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.