-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Runtime output buffer optimization #3276
base: main
Are you sure you want to change the base?
Conversation
d0ef3cd
to
377248e
Compare
0a98180
to
4a5f0d1
Compare
I think this PR doesn't have to do with fake tensor as output shape was inferred from trt function( |
core/runtime/execute_engine.cpp
Outdated
@@ -263,19 +284,15 @@ std::vector<at::Tensor> execute_engine(std::vector<at::Tensor> inputs, c10::intr | |||
output_profiler_guard = | |||
std::make_unique<torch::autograd::profiler::RecordProfile>(compiled_engine->output_profile_path); | |||
} | |||
if ((false == compiled_engine->use_pre_allocated_outputs) || shape_changed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!compiled_engine->use_pre_allocated_outputs ?
return false; | ||
} | ||
|
||
std::vector<at::Tensor> create_output_tensors(c10::intrusive_ptr<TRTEngine> compiled_engine) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we functionalize inputs allocation/creation in the execute engine similar to this ? ( I posted a similar comment in your wrapper module PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a context manager to enable this across subgraphs
struct RuntimeStates { | ||
bool need_cudagraphs_record; | ||
bool can_use_pre_allocated_outputs; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If weight streaming budget is changed in cuda graph mode, new capture is required.
weight streaming state will be added
@@ -17,6 +17,9 @@ | |||
"Torch-TensorRT runtime is not available", | |||
) | |||
class TestCudagraphsCPP(TestCase): | |||
def tearDown(self): | |||
# Reset to default cuda graph mode after each test | |||
torch_tensorrt.runtime.set_cudagraphs_mode(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if multiple tests are running by pytest, other test can run on cuda graph mode. Ensure to turn off cuda graph mode after test.
Description
Latency hiding by creating the output tensor for next output buffer
Fixes #3275
Type of change
Please delete options that are not relevant and/or add your own.
Checklist: