-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PA: Fixed scheduler manager #88
Conversation
IzzyPutterman
commented
Sep 18, 2024
•
edited by lkomali
Loading
edited by lkomali
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems sane and I think works.
But I am interested to see how this works if the schedule falls behind.
Once PA gets behind, I dont think it can catch up. I think that behavior needs to also change to enable this schedule functionality to work.
src/command_line_parser.cc
Outdated
@@ -1832,15 +1847,17 @@ CLParser::VerifyOptions() | |||
params_->using_custom_intervals}; | |||
if (std::count(load_modes.begin(), load_modes.end(), true) > 1) { | |||
Usage( | |||
"Cannot specify more then one inference load mode. Please choose only " | |||
"Cannot specify more then one inference load mode. Please choose " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a different formatter installed?
A quick pass with similar settings will eliminate a bunch of noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im pretty sure I got the same one as in genai-perf, but I can double check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, clang-format-16 should be used and it should be following the .clang-format
file in the repo root
you can install it on ubuntu like this:
MY_TMP_DIR=$(mktemp -d) && wget https://apt.llvm.org/llvm.sh -O $MY_TMP_DIR/llvm.sh && bash $MY_TMP_DIR/llvm.sh 16
apt update && apt install -y --no-install-recommends clang-format-16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Friendly ping
With one thread and the schedule I had in the slack thread I definitely saw it fall behind on one of the requests, but I think the subsequent deltas remained consistent. Not sure how to deal with this unless we overprovision the threads or allow for dynamic allocation |
…t as well (#88) (#89) * Replace binding index-based methods with name-based alternatives * Remove unused variables * Remove unused variables * Remove allInput*Specified() * Delete TRTV1Interface * Replace getProfileShapeValues() with getProfileTensorValues() * Remove buffer_bindings_ * Enhancements * Replace isExecutionBinding() * Add INT64 support * Remove hasImplicitBatchDimension() * Update Copyright * Remove unused variables * Undo copyright * Undo Copyright * Undo copyright * Fix the handling in INT64 shape tensors output * Fix data dependent output shapes * Fix pre commit errors * Update copyright * Resolve review comments * Include source for building on TRT 8 (#86) (#87) * Include source for building on TRT 8 * Apply suggestions from code review --------- * Fix envvar access in CMake --------- Co-authored-by: Sai Kiran Polisetty <[email protected]> Co-authored-by: Misha Chornyi <[email protected]>
* Adding support for TensorRT 10 APIs in the backend. Keep TRT 8 support as well (#88) * Replace binding index-based methods with name-based alternatives * Remove unused variables * Remove unused variables * Remove allInput*Specified() * Delete TRTV1Interface * Replace getProfileShapeValues() with getProfileTensorValues() * Remove buffer_bindings_ * Enhancements * Replace isExecutionBinding() * Add INT64 support * Remove hasImplicitBatchDimension() * Update Copyright * Remove unused variables * Undo copyright * Undo Copyright * Undo copyright * Fix the handling in INT64 shape tensors output * Fix data dependent output shapes * Fix pre commit errors * Update copyright * Resolve review comments * Include source for building on TRT 8 (#86) (#87) * Include source for building on TRT 8 * Apply suggestions from code review --------- Co-authored-by: Misha Chornyi <[email protected]> * Fix envvar access in CMake --------- Co-authored-by: Sai Kiran Polisetty <[email protected]> Co-authored-by: Misha Chornyi <[email protected]> * Add support for kBF16 --------- Co-authored-by: Tanmay Verma <[email protected]> Co-authored-by: Misha Chornyi <[email protected]>
a15a379
to
2af8273
Compare
Fix pre-commit errors
2af8273
to
8d3f7e0
Compare
150231e
to
d8628bf
Compare
#include "load_manager.h" | ||
#include "request_rate_manager.h" | ||
|
||
namespace triton { namespace perfanalyzer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace triton { namespace perfanalyzer { | |
namespace triton::perfanalyzer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general trend followed in all other files is
namespace triton { namespace perfanalyzer {
Should I modify it to namespace triton::perfanalyzer {
only in this particular file for now or modify it everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No let's just do it incrementally too. This is new syntax from C++17, and it's cleaner in general. If you spot easy to change locations elsewhere, feel free to do so, but let's work incrementally towards more readable code.
/// manager Returns a new CustomRequestScheduleManager object \param | ||
/// request_parameters Custom request parameters to send to the server \return | ||
/// cb::Error object indicating success or failure | ||
static cb::Error Create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I... am conflicted here, but here's my feedback on what I see.
This sort of API is... not great. It's extremely error prone, due to the sheer number of unnamed parameters that are mixed between input and output. Yet, it matches the current existing API from RequestRateManager
, and you're just following the existing pattern, so that's not really on you, and the rest of what I'm typing here isn't directed at you, just my general statement on the state of the codebase in general.
To explain the problem: what I see here is full of what's now considered bad programming practice, and we should start considering cleaning this code. I'm just not sure if we should start now with this PR, or have a more thorough clean later, as trying to do this now may (1) take a long time for you and (2) look akward if it's not propagated throughout the rest of the codebase. But also, working incrementally towards having a better codebase is a good path forward.
- We need to embrace named argument paradigms. C++20 allows for better syntax, but even in normal C++03, we can do something along the lines of a struct which contains the arguments to pass over. This by far is the most problematic part of this call here. This function call has way too many arguments, and it's excessively error-prone. We should instead have structured arguments to then unpack and use later. If only one of the elements from my feedback is actioned upon, it should be this one.
- Random bool values in the middle of a function call is also error prone. A bit less so if we have structured arguments. If we do not have structured argument for bools, we should generally speaking replace them with enums to be more expressive of what exactly we're doing instead of just having a string of
true
andfalse
in the middle of a function call. - We'd need to use something like std::span instead of passing references to vectors. Unfortunately, this is also something which only exists in C++20. We could also use Abseil here, since we have it in the dependency tree anyway transitively from gRPC.
- The usage of shared_ptr is usually a codesmell. I need to investigate more what's going on here, but I'd really like to fully ban shared_ptr altogether from the codebase if I could. Again, depending on what's exactly being done here, but it generally speaking is a telltale of ownership issues. The fact it's passing const references to shared pointers is even more indicative of an exising problem.
- Passing a pointer in the middle of the list of arguments for the output value is also a red flag. We should refactor the return value system we have to have a "StatusOr" paradigm, which is basically an std::variant between error codes, and return types, allowing us to either return an error, or an actual value. Abseil also provides this, but I'd rather we come up with something which matches our usage case properly, as Abseil's version is way too opiniated for our usage.
- The mix'n'match of signed (
batch_size
) and unsigned (max_threads
andnum_of_sequences
) values for size types is odd. In general we should use unsigned for anything size-related. - Durations, such as this
measurement_window_ms
, should be better typed to incorporate the time unit, ideally using std::chrono.
Now, looking at it a bit more down the code, in perf_analyzer.cc
, we have a structure already with all the parameters, called params_
, which we unpack into all of these arguments... Instead, we could just easily pass that structure around, and let the callees decide what to pick from them. As in, my first point in this feedback is already done, basically, one would just need to haul the structure up and down instead of unpacking / repacking it with so many arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify a bit my stance: I am not against smart pointers in general, just that shared_ptr
are usually unique_ptr
in disguise. They are non-committal about the actual ownership of the object, whereas unique_ptr
has a clearer definition of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shared_ptr
is for client backend factory and I have no idea how much of an effort it is to modify everything to unique_ptr
. I can create a ticket for that effort.
Coming to the constructor call, I agree with all your points. As far as I understand, your suggestion is to pass params object instead of the individual arguments. It makes sense to me. Again, I'm not sure if I'm going to break something by modifying the API. Let me know if I should add a ticket to investigate and modify the all the related APIs because as you said, it might look awkward if we modify for some of them and it's not propagated until some time after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, let's just try and plumb the parameters object in this one instance, and keep in mind we should do this for newer code. If we have time, we should go back and clean older API incrementally. You're right: let's not break existing code right now, but let's avoid introducing new code which follows this older pattern.
For the unique_ptr vs shared_ptr, this was more of a general remark that we should keep in mind when adding new shared_ptr in the codebase. At some point we should review the existing usage and clean it up, but not in this one PR.
src/inference_profiler.cc
Outdated
dynamic_cast<CustomRequestScheduleManager*>(manager_.get()); | ||
auto request_rate_manager = dynamic_cast<RequestRateManager*>(manager_.get()); | ||
|
||
if (custom_manager) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
flips between essentially the same code, just with different object types from a common pointer type using dynamic_cast
. Given that PerformWarmup
and ChangeRequestRate
are being made virtuals already in this PR, this shouldn't be needed at all. The only difference lies in the debug message being emitted, which could also just become a virtual function returning a string, such as GetManagerName
, to embed in a status message such as this one. This means dynamic_cast
here is completely redundant with the rest of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dynamic_cast
is required. RequestRate
inherits from LoadManager
class. These function are specific to RequestRate
and classes that inherits from it. If I remove dynamic cast, the type for manager would be LoadManager which would throw an error when I try to call the PerformWarmup
and ChangeRequestRate
functions.
The if condition is mainly for properly cast to the right type. I think custom error messages are not required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this means we only need a single dynamic_cast
to RequestRateManager
. We might want to ensure it's not nullptr
for safety, in case the manager object we get at this point is indeed one of the two types, but otherwise the same remark should apply.
output_shm_size, serial_sequences, parser, factory, | ||
request_parameters)); | ||
|
||
*manager = std::move(local_manager); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically an anti-pattern. This should (1) use std::make_unique
, and (2) directly store it straight into the output storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives an error because the constructor is protected and std::make_unique
calls the constructor from outside of class scope, so for this to work, we should make constructor public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fiiiiiiiiine. It's okay this way then.
Fix comments
d8628bf
to
7282e1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now, thank you so much!
* Do I know what I am doing? * compiles apparently * Move to request_rate * Fix small bugs * more small fixes * Maybe fix small issue * Actually initialize empty * Convert to const * Drop old files * Force request-count to equal schedule length * Copy from GenAI branch * Deformat * Reorganize arg order Fix pre-commit errors * Fix errors, modify design * fix pre-commit Fix comments --------- Co-authored-by: lkomali <[email protected]> Co-authored-by: Harshini Komali <[email protected]>
scaled_schedule.reserve(schedule.size()); | ||
if (schedule.size() > 0) { | ||
for (const auto& value : schedule) { | ||
scaled_schedule.push_back(value / static_cast<float>(request_rate)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IzzyPutterman why are the values from schedule (e.g. --schedule=1,2,3,4,4.5,6,9,10,11,13,15,18
being divided by the request rate?
I thought those values are supposed to be absolute timestamps?