Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PA: Fixed scheduler manager #88

Merged
merged 16 commits into from
Nov 7, 2024
Merged

PA: Fixed scheduler manager #88

merged 16 commits into from
Nov 7, 2024

Conversation

IzzyPutterman
Copy link
Contributor

@IzzyPutterman IzzyPutterman commented Sep 18, 2024

image

src/command_line_parser.cc Outdated Show resolved Hide resolved
src/inference_profiler.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@debermudez debermudez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems sane and I think works.
But I am interested to see how this works if the schedule falls behind.
Once PA gets behind, I dont think it can catch up. I think that behavior needs to also change to enable this schedule functionality to work.

@@ -1832,15 +1847,17 @@ CLParser::VerifyOptions()
params_->using_custom_intervals};
if (std::count(load_modes.begin(), load_modes.end(), true) > 1) {
Usage(
"Cannot specify more then one inference load mode. Please choose only "
"Cannot specify more then one inference load mode. Please choose "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a different formatter installed?
A quick pass with similar settings will eliminate a bunch of noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im pretty sure I got the same one as in genai-perf, but I can double check

Copy link
Contributor

@matthewkotila matthewkotila Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, clang-format-16 should be used and it should be following the .clang-format file in the repo root

you can install it on ubuntu like this:

MY_TMP_DIR=$(mktemp -d) && wget https://apt.llvm.org/llvm.sh -O $MY_TMP_DIR/llvm.sh && bash $MY_TMP_DIR/llvm.sh 16

apt update && apt install -y --no-install-recommends clang-format-16

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly ping

src/custom_load_manager.cc Outdated Show resolved Hide resolved
@IzzyPutterman
Copy link
Contributor Author

This seems sane and I think works. But I am interested to see how this works if the schedule falls behind. Once PA gets behind, I dont think it can catch up. I think that behavior needs to also change to enable this schedule functionality to work.

With one thread and the schedule I had in the slack thread I definitely saw it fall behind on one of the requests, but I think the subsequent deltas remained consistent. Not sure how to deal with this unless we overprovision the threads or allow for dynamic allocation

pvijayakrish pushed a commit that referenced this pull request Oct 8, 2024
…t as well (#88) (#89)

* Replace binding index-based methods with name-based alternatives

* Remove unused variables

* Remove unused variables

* Remove allInput*Specified()

* Delete TRTV1Interface

* Replace getProfileShapeValues() with getProfileTensorValues()

* Remove buffer_bindings_

* Enhancements

* Replace isExecutionBinding()

* Add INT64 support

* Remove hasImplicitBatchDimension()

* Update Copyright

* Remove unused variables

* Undo copyright

* Undo Copyright

* Undo copyright

* Fix the handling in INT64 shape tensors output

* Fix data dependent output shapes

* Fix pre commit errors

* Update copyright

* Resolve review comments

* Include source for building on TRT 8 (#86) (#87)

* Include source for building on TRT 8

* Apply suggestions from code review



---------



* Fix envvar access in CMake

---------

Co-authored-by: Sai Kiran Polisetty <[email protected]>
Co-authored-by: Misha Chornyi <[email protected]>
pvijayakrish pushed a commit that referenced this pull request Oct 8, 2024
* Adding support for TensorRT 10 APIs in the backend. Keep TRT 8 support as well (#88)

* Replace binding index-based methods with name-based alternatives

* Remove unused variables

* Remove unused variables

* Remove allInput*Specified()

* Delete TRTV1Interface

* Replace getProfileShapeValues() with getProfileTensorValues()

* Remove buffer_bindings_

* Enhancements

* Replace isExecutionBinding()

* Add INT64 support

* Remove hasImplicitBatchDimension()

* Update Copyright

* Remove unused variables

* Undo copyright

* Undo Copyright

* Undo copyright

* Fix the handling in INT64 shape tensors output

* Fix data dependent output shapes

* Fix pre commit errors

* Update copyright

* Resolve review comments

* Include source for building on TRT 8 (#86) (#87)

* Include source for building on TRT 8

* Apply suggestions from code review



---------

Co-authored-by: Misha Chornyi <[email protected]>

* Fix envvar access in CMake

---------

Co-authored-by: Sai Kiran Polisetty <[email protected]>
Co-authored-by: Misha Chornyi <[email protected]>

* Add support for kBF16

---------

Co-authored-by: Tanmay Verma <[email protected]>
Co-authored-by: Misha Chornyi <[email protected]>
src/perf_analyzer.cc Outdated Show resolved Hide resolved
Fix pre-commit errors
#include "load_manager.h"
#include "request_rate_manager.h"

namespace triton { namespace perfanalyzer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace triton { namespace perfanalyzer {
namespace triton::perfanalyzer {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general trend followed in all other files is
namespace triton { namespace perfanalyzer {
Should I modify it to namespace triton::perfanalyzer { only in this particular file for now or modify it everywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No let's just do it incrementally too. This is new syntax from C++17, and it's cleaner in general. If you spot easy to change locations elsewhere, feel free to do so, but let's work incrementally towards more readable code.

/// manager Returns a new CustomRequestScheduleManager object \param
/// request_parameters Custom request parameters to send to the server \return
/// cb::Error object indicating success or failure
static cb::Error Create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I... am conflicted here, but here's my feedback on what I see.

This sort of API is... not great. It's extremely error prone, due to the sheer number of unnamed parameters that are mixed between input and output. Yet, it matches the current existing API from RequestRateManager, and you're just following the existing pattern, so that's not really on you, and the rest of what I'm typing here isn't directed at you, just my general statement on the state of the codebase in general.

To explain the problem: what I see here is full of what's now considered bad programming practice, and we should start considering cleaning this code. I'm just not sure if we should start now with this PR, or have a more thorough clean later, as trying to do this now may (1) take a long time for you and (2) look akward if it's not propagated throughout the rest of the codebase. But also, working incrementally towards having a better codebase is a good path forward.

  • We need to embrace named argument paradigms. C++20 allows for better syntax, but even in normal C++03, we can do something along the lines of a struct which contains the arguments to pass over. This by far is the most problematic part of this call here. This function call has way too many arguments, and it's excessively error-prone. We should instead have structured arguments to then unpack and use later. If only one of the elements from my feedback is actioned upon, it should be this one.
  • Random bool values in the middle of a function call is also error prone. A bit less so if we have structured arguments. If we do not have structured argument for bools, we should generally speaking replace them with enums to be more expressive of what exactly we're doing instead of just having a string of true and false in the middle of a function call.
  • We'd need to use something like std::span instead of passing references to vectors. Unfortunately, this is also something which only exists in C++20. We could also use Abseil here, since we have it in the dependency tree anyway transitively from gRPC.
  • The usage of shared_ptr is usually a codesmell. I need to investigate more what's going on here, but I'd really like to fully ban shared_ptr altogether from the codebase if I could. Again, depending on what's exactly being done here, but it generally speaking is a telltale of ownership issues. The fact it's passing const references to shared pointers is even more indicative of an exising problem.
  • Passing a pointer in the middle of the list of arguments for the output value is also a red flag. We should refactor the return value system we have to have a "StatusOr" paradigm, which is basically an std::variant between error codes, and return types, allowing us to either return an error, or an actual value. Abseil also provides this, but I'd rather we come up with something which matches our usage case properly, as Abseil's version is way too opiniated for our usage.
  • The mix'n'match of signed (batch_size) and unsigned (max_threads and num_of_sequences) values for size types is odd. In general we should use unsigned for anything size-related.
  • Durations, such as this measurement_window_ms, should be better typed to incorporate the time unit, ideally using std::chrono.

Now, looking at it a bit more down the code, in perf_analyzer.cc, we have a structure already with all the parameters, called params_, which we unpack into all of these arguments... Instead, we could just easily pass that structure around, and let the callees decide what to pick from them. As in, my first point in this feedback is already done, basically, one would just need to haul the structure up and down instead of unpacking / repacking it with so many arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify a bit my stance: I am not against smart pointers in general, just that shared_ptr are usually unique_ptr in disguise. They are non-committal about the actual ownership of the object, whereas unique_ptr has a clearer definition of it.

Copy link
Contributor

@lkomali lkomali Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shared_ptr is for client backend factory and I have no idea how much of an effort it is to modify everything to unique_ptr. I can create a ticket for that effort.
Coming to the constructor call, I agree with all your points. As far as I understand, your suggestion is to pass params object instead of the individual arguments. It makes sense to me. Again, I'm not sure if I'm going to break something by modifying the API. Let me know if I should add a ticket to investigate and modify the all the related APIs because as you said, it might look awkward if we modify for some of them and it's not propagated until some time after.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, let's just try and plumb the parameters object in this one instance, and keep in mind we should do this for newer code. If we have time, we should go back and clean older API incrementally. You're right: let's not break existing code right now, but let's avoid introducing new code which follows this older pattern.

For the unique_ptr vs shared_ptr, this was more of a general remark that we should keep in mind when adding new shared_ptr in the codebase. At some point we should review the existing usage and clean it up, but not in this one PR.

dynamic_cast<CustomRequestScheduleManager*>(manager_.get());
auto request_rate_manager = dynamic_cast<RequestRateManager*>(manager_.get());

if (custom_manager) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if flips between essentially the same code, just with different object types from a common pointer type using dynamic_cast. Given that PerformWarmup and ChangeRequestRate are being made virtuals already in this PR, this shouldn't be needed at all. The only difference lies in the debug message being emitted, which could also just become a virtual function returning a string, such as GetManagerName, to embed in a status message such as this one. This means dynamic_cast here is completely redundant with the rest of the code.

Copy link
Contributor

@lkomali lkomali Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dynamic_cast is required. RequestRate inherits from LoadManager class. These function are specific to RequestRate and classes that inherits from it. If I remove dynamic cast, the type for manager would be LoadManager which would throw an error when I try to call the PerformWarmup and ChangeRequestRate functions.
The if condition is mainly for properly cast to the right type. I think custom error messages are not required here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this means we only need a single dynamic_cast to RequestRateManager. We might want to ensure it's not nullptr for safety, in case the manager object we get at this point is indeed one of the two types, but otherwise the same remark should apply.

output_shm_size, serial_sequences, parser, factory,
request_parameters));

*manager = std::move(local_manager);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically an anti-pattern. This should (1) use std::make_unique, and (2) directly store it straight into the output storage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives an error because the constructor is protected and std::make_unique calls the constructor from outside of class scope, so for this to work, we should make constructor public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fiiiiiiiiine. It's okay this way then.

Fix comments
Copy link
Contributor

@nicolasnoble nicolasnoble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, thank you so much!

@lkomali lkomali merged commit 5dfebe1 into main Nov 7, 2024
7 checks passed
@lkomali lkomali deleted the iputterman/fixed-schedule branch November 7, 2024 19:06
nv-braf pushed a commit that referenced this pull request Nov 14, 2024
* Do I know what I am doing?

* compiles apparently

* Move to request_rate

* Fix small bugs

* more small fixes

* Maybe fix small issue

* Actually initialize empty

* Convert to const

* Drop old files

* Force request-count to equal schedule length

* Copy from GenAI branch

* Deformat

* Reorganize arg order

Fix pre-commit errors

* Fix errors, modify design

* fix pre-commit

Fix comments

---------

Co-authored-by: lkomali <[email protected]>
Co-authored-by: Harshini Komali <[email protected]>
scaled_schedule.reserve(schedule.size());
if (schedule.size() > 0) {
for (const auto& value : schedule) {
scaled_schedule.push_back(value / static_cast<float>(request_rate));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IzzyPutterman why are the values from schedule (e.g. --schedule=1,2,3,4,4.5,6,9,10,11,13,15,18 being divided by the request rate?

I thought those values are supposed to be absolute timestamps?

cc @lkomali @debermudez

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants