PA: Fixed scheduler manager #88

IzzyPutterman · 2024-09-18T05:59:05Z

src/command_line_parser.cc

src/inference_profiler.cc

debermudez

This seems sane and I think works.
But I am interested to see how this works if the schedule falls behind.
Once PA gets behind, I dont think it can catch up. I think that behavior needs to also change to enable this schedule functionality to work.

debermudez · 2024-09-20T21:21:25Z

src/command_line_parser.cc

@@ -1832,15 +1847,17 @@ CLParser::VerifyOptions()
      params_->using_custom_intervals};
  if (std::count(load_modes.begin(), load_modes.end(), true) > 1) {
    Usage(
-        "Cannot specify more then one inference load mode. Please choose only "
+        "Cannot specify more then one inference load mode. Please choose "


Is there a different formatter installed?
A quick pass with similar settings will eliminate a bunch of noise.

Im pretty sure I got the same one as in genai-perf, but I can double check

+1, clang-format-16 should be used and it should be following the .clang-format file in the repo root

you can install it on ubuntu like this:

MY_TMP_DIR=$(mktemp -d) && wget https://apt.llvm.org/llvm.sh -O $MY_TMP_DIR/llvm.sh && bash $MY_TMP_DIR/llvm.sh 16 apt update && apt install -y --no-install-recommends clang-format-16

Friendly ping

src/custom_load_manager.cc

IzzyPutterman · 2024-09-22T01:59:01Z

This seems sane and I think works. But I am interested to see how this works if the schedule falls behind. Once PA gets behind, I dont think it can catch up. I think that behavior needs to also change to enable this schedule functionality to work.

With one thread and the schedule I had in the slack thread I definitely saw it fall behind on one of the requests, but I think the subsequent deltas remained consistent. Not sure how to deal with this unless we overprovision the threads or allow for dynamic allocation

src/command_line_parser.cc

…t as well (#88) (#89) * Replace binding index-based methods with name-based alternatives * Remove unused variables * Remove unused variables * Remove allInput*Specified() * Delete TRTV1Interface * Replace getProfileShapeValues() with getProfileTensorValues() * Remove buffer_bindings_ * Enhancements * Replace isExecutionBinding() * Add INT64 support * Remove hasImplicitBatchDimension() * Update Copyright * Remove unused variables * Undo copyright * Undo Copyright * Undo copyright * Fix the handling in INT64 shape tensors output * Fix data dependent output shapes * Fix pre commit errors * Update copyright * Resolve review comments * Include source for building on TRT 8 (#86) (#87) * Include source for building on TRT 8 * Apply suggestions from code review --------- * Fix envvar access in CMake --------- Co-authored-by: Sai Kiran Polisetty <[email protected]> Co-authored-by: Misha Chornyi <[email protected]>

* Adding support for TensorRT 10 APIs in the backend. Keep TRT 8 support as well (#88) * Replace binding index-based methods with name-based alternatives * Remove unused variables * Remove unused variables * Remove allInput*Specified() * Delete TRTV1Interface * Replace getProfileShapeValues() with getProfileTensorValues() * Remove buffer_bindings_ * Enhancements * Replace isExecutionBinding() * Add INT64 support * Remove hasImplicitBatchDimension() * Update Copyright * Remove unused variables * Undo copyright * Undo Copyright * Undo copyright * Fix the handling in INT64 shape tensors output * Fix data dependent output shapes * Fix pre commit errors * Update copyright * Resolve review comments * Include source for building on TRT 8 (#86) (#87) * Include source for building on TRT 8 * Apply suggestions from code review --------- Co-authored-by: Misha Chornyi <[email protected]> * Fix envvar access in CMake --------- Co-authored-by: Sai Kiran Polisetty <[email protected]> Co-authored-by: Misha Chornyi <[email protected]> * Add support for kBF16 --------- Co-authored-by: Tanmay Verma <[email protected]> Co-authored-by: Misha Chornyi <[email protected]>

src/test_request_rate_manager.cc

src/perf_analyzer.cc

Fix pre-commit errors

nicolasnoble · 2024-10-31T20:45:29Z

src/custom_request_schedule_manager.h

+#include "load_manager.h"
+#include "request_rate_manager.h"
+
+namespace triton { namespace perfanalyzer {


Suggested change

namespace triton { namespace perfanalyzer {

namespace triton::perfanalyzer {

The general trend followed in all other files is
namespace triton { namespace perfanalyzer {
Should I modify it to namespace triton::perfanalyzer { only in this particular file for now or modify it everywhere?

No let's just do it incrementally too. This is new syntax from C++17, and it's cleaner in general. If you spot easy to change locations elsewhere, feel free to do so, but let's work incrementally towards more readable code.

nicolasnoble · 2024-10-31T21:54:20Z

src/custom_request_schedule_manager.h

+  /// manager Returns a new CustomRequestScheduleManager object \param
+  /// request_parameters Custom request parameters to send to the server \return
+  /// cb::Error object indicating success or failure
+  static cb::Error Create(


Okay I... am conflicted here, but here's my feedback on what I see.

This sort of API is... not great. It's extremely error prone, due to the sheer number of unnamed parameters that are mixed between input and output. Yet, it matches the current existing API from RequestRateManager, and you're just following the existing pattern, so that's not really on you, and the rest of what I'm typing here isn't directed at you, just my general statement on the state of the codebase in general.

To explain the problem: what I see here is full of what's now considered bad programming practice, and we should start considering cleaning this code. I'm just not sure if we should start now with this PR, or have a more thorough clean later, as trying to do this now may (1) take a long time for you and (2) look akward if it's not propagated throughout the rest of the codebase. But also, working incrementally towards having a better codebase is a good path forward.

We need to embrace named argument paradigms. C++20 allows for better syntax, but even in normal C++03, we can do something along the lines of a struct which contains the arguments to pass over. This by far is the most problematic part of this call here. This function call has way too many arguments, and it's excessively error-prone. We should instead have structured arguments to then unpack and use later. If only one of the elements from my feedback is actioned upon, it should be this one.

Random bool values in the middle of a function call is also error prone. A bit less so if we have structured arguments. If we do not have structured argument for bools, we should generally speaking replace them with enums to be more expressive of what exactly we're doing instead of just having a string of true and false in the middle of a function call.

We'd need to use something like std::span instead of passing references to vectors. Unfortunately, this is also something which only exists in C++20. We could also use Abseil here, since we have it in the dependency tree anyway transitively from gRPC.

The usage of shared_ptr is usually a codesmell. I need to investigate more what's going on here, but I'd really like to fully ban shared_ptr altogether from the codebase if I could. Again, depending on what's exactly being done here, but it generally speaking is a telltale of ownership issues. The fact it's passing const references to shared pointers is even more indicative of an exising problem.

Passing a pointer in the middle of the list of arguments for the output value is also a red flag. We should refactor the return value system we have to have a "StatusOr" paradigm, which is basically an std::variant between error codes, and return types, allowing us to either return an error, or an actual value. Abseil also provides this, but I'd rather we come up with something which matches our usage case properly, as Abseil's version is way too opiniated for our usage.

The mix'n'match of signed (batch_size) and unsigned (max_threads and num_of_sequences) values for size types is odd. In general we should use unsigned for anything size-related.

Durations, such as this measurement_window_ms, should be better typed to incorporate the time unit, ideally using std::chrono.

Now, looking at it a bit more down the code, in perf_analyzer.cc, we have a structure already with all the parameters, called params_, which we unpack into all of these arguments... Instead, we could just easily pass that structure around, and let the callees decide what to pick from them. As in, my first point in this feedback is already done, basically, one would just need to haul the structure up and down instead of unpacking / repacking it with so many arguments.

To clarify a bit my stance: I am not against smart pointers in general, just that shared_ptr are usually unique_ptr in disguise. They are non-committal about the actual ownership of the object, whereas unique_ptr has a clearer definition of it.

The shared_ptr is for client backend factory and I have no idea how much of an effort it is to modify everything to unique_ptr. I can create a ticket for that effort.
Coming to the constructor call, I agree with all your points. As far as I understand, your suggestion is to pass params object instead of the individual arguments. It makes sense to me. Again, I'm not sure if I'm going to break something by modifying the API. Let me know if I should add a ticket to investigate and modify the all the related APIs because as you said, it might look awkward if we modify for some of them and it's not propagated until some time after.

Right, let's just try and plumb the parameters object in this one instance, and keep in mind we should do this for newer code. If we have time, we should go back and clean older API incrementally. You're right: let's not break existing code right now, but let's avoid introducing new code which follows this older pattern.

For the unique_ptr vs shared_ptr, this was more of a general remark that we should keep in mind when adding new shared_ptr in the codebase. At some point we should review the existing usage and clean it up, but not in this one PR.

nicolasnoble · 2024-10-31T22:07:36Z

src/inference_profiler.cc

+      dynamic_cast<CustomRequestScheduleManager*>(manager_.get());
+  auto request_rate_manager = dynamic_cast<RequestRateManager*>(manager_.get());
+
+  if (custom_manager) {


This if flips between essentially the same code, just with different object types from a common pointer type using dynamic_cast. Given that PerformWarmup and ChangeRequestRate are being made virtuals already in this PR, this shouldn't be needed at all. The only difference lies in the debug message being emitted, which could also just become a virtual function returning a string, such as GetManagerName, to embed in a status message such as this one. This means dynamic_cast here is completely redundant with the rest of the code.

I think dynamic_cast is required. RequestRate inherits from LoadManager class. These function are specific to RequestRate and classes that inherits from it. If I remove dynamic cast, the type for manager would be LoadManager which would throw an error when I try to call the PerformWarmup and ChangeRequestRate functions.
The if condition is mainly for properly cast to the right type. I think custom error messages are not required here.

Right, this means we only need a single dynamic_cast to RequestRateManager. We might want to ensure it's not nullptr for safety, in case the manager object we get at this point is indeed one of the two types, but otherwise the same remark should apply.

nicolasnoble · 2024-10-31T22:51:05Z

src/custom_request_schedule_manager.cc

+          output_shm_size, serial_sequences, parser, factory,
+          request_parameters));
+
+  *manager = std::move(local_manager);


This is technically an anti-pattern. This should (1) use std::make_unique, and (2) directly store it straight into the output storage.

This gives an error because the constructor is protected and std::make_unique calls the constructor from outside of class scope, so for this to work, we should make constructor public.

Fiiiiiiiiine. It's okay this way then.

Fix comments

nicolasnoble

Looks good now, thank you so much!

* Do I know what I am doing? * compiles apparently * Move to request_rate * Fix small bugs * more small fixes * Maybe fix small issue * Actually initialize empty * Convert to const * Drop old files * Force request-count to equal schedule length * Copy from GenAI branch * Deformat * Reorganize arg order Fix pre-commit errors * Fix errors, modify design * fix pre-commit Fix comments --------- Co-authored-by: lkomali <[email protected]> Co-authored-by: Harshini Komali <[email protected]>

matthewkotila · 2024-12-04T00:41:20Z

src/custom_request_schedule_manager.cc

+  scaled_schedule.reserve(schedule.size());
+  if (schedule.size() > 0) {
+    for (const auto& value : schedule) {
+      scaled_schedule.push_back(value / static_cast<float>(request_rate));


@IzzyPutterman why are the values from schedule (e.g. --schedule=1,2,3,4,4.5,6,9,10,11,13,15,18 being divided by the request rate?

I thought those values are supposed to be absolute timestamps?

cc @lkomali @debermudez

IzzyPutterman commented Sep 18, 2024

View reviewed changes

src/command_line_parser.cc Outdated Show resolved Hide resolved

src/inference_profiler.cc Outdated Show resolved Hide resolved

IzzyPutterman temporarily deployed to GITLAB September 18, 2024 18:24 — with GitHub Actions Inactive

IzzyPutterman temporarily deployed to GITLAB September 18, 2024 19:01 — with GitHub Actions Inactive

IzzyPutterman temporarily deployed to GITLAB September 18, 2024 19:09 — with GitHub Actions Inactive

IzzyPutterman temporarily deployed to GITLAB September 18, 2024 19:32 — with GitHub Actions Inactive

IzzyPutterman temporarily deployed to GITLAB September 18, 2024 20:01 — with GitHub Actions Inactive

IzzyPutterman temporarily deployed to GITLAB September 18, 2024 20:11 — with GitHub Actions Inactive

IzzyPutterman temporarily deployed to GITLAB September 18, 2024 23:15 — with GitHub Actions Inactive

IzzyPutterman requested a review from matthewkotila September 19, 2024 14:54

debermudez reviewed Sep 20, 2024

View reviewed changes

matthewkotila reviewed Oct 3, 2024

View reviewed changes

src/command_line_parser.cc Show resolved Hide resolved

matthewkotila reviewed Oct 3, 2024

View reviewed changes

src/command_line_parser.cc Show resolved Hide resolved

IzzyPutterman temporarily deployed to GITLAB October 7, 2024 21:26 — with GitHub Actions Inactive

IzzyPutterman temporarily deployed to GITLAB October 7, 2024 21:43 — with GitHub Actions Inactive

matthewkotila reviewed Oct 9, 2024

View reviewed changes

src/test_request_rate_manager.cc Outdated Show resolved Hide resolved

matthewkotila reviewed Oct 24, 2024

View reviewed changes

src/perf_analyzer.cc Outdated Show resolved Hide resolved

IzzyPutterman added 2 commits October 29, 2024 10:01

Do I know what I am doing?

c326db9

compiles apparently

8db6bc6

lkomali force-pushed the iputterman/fixed-schedule branch from a15a379 to 2af8273 Compare October 29, 2024 17:11

lkomali temporarily deployed to GITLAB October 29, 2024 17:11 — with GitHub Actions Inactive

Reorganize arg order

8d3f7e0

Fix pre-commit errors

lkomali force-pushed the iputterman/fixed-schedule branch from 2af8273 to 8d3f7e0 Compare October 29, 2024 17:16

lkomali temporarily deployed to GITLAB October 29, 2024 17:16 — with GitHub Actions Inactive

Fix errors, modify design

855f11b

lkomali temporarily deployed to GITLAB October 31, 2024 20:15 — with GitHub Actions Inactive

lkomali temporarily deployed to GITLAB October 31, 2024 20:16 — with GitHub Actions Inactive

lkomali requested review from matthewkotila and nicolasnoble October 31, 2024 20:23

lkomali force-pushed the iputterman/fixed-schedule branch from 150231e to d8628bf Compare October 31, 2024 21:43

lkomali temporarily deployed to GITLAB October 31, 2024 21:43 — with GitHub Actions Inactive

nicolasnoble requested changes Oct 31, 2024

View reviewed changes

lkomali requested a review from ganeshku1 November 5, 2024 23:56

fix pre-commit

7282e1b

Fix comments

lkomali force-pushed the iputterman/fixed-schedule branch from d8628bf to 7282e1b Compare November 6, 2024 16:36

lkomali temporarily deployed to GITLAB November 6, 2024 16:36 — with GitHub Actions Inactive

lkomali temporarily deployed to GITLAB November 6, 2024 16:37 — with GitHub Actions Inactive

lkomali requested a review from nicolasnoble November 6, 2024 16:43

nicolasnoble approved these changes Nov 6, 2024

View reviewed changes

Merge branch 'main' into iputterman/fixed-schedule

e5ac4b9

lkomali temporarily deployed to GITLAB November 6, 2024 23:46 — with GitHub Actions Inactive

lkomali merged commit 5dfebe1 into main Nov 7, 2024
7 checks passed

lkomali deleted the iputterman/fixed-schedule branch November 7, 2024 19:06

matthewkotila reviewed Dec 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PA: Fixed scheduler manager #88

PA: Fixed scheduler manager #88

IzzyPutterman commented Sep 18, 2024 •

edited by lkomali

Loading

debermudez left a comment

debermudez Sep 20, 2024

IzzyPutterman Sep 22, 2024

matthewkotila Oct 3, 2024 •

edited

Loading

matthewkotila Oct 7, 2024

IzzyPutterman commented Sep 22, 2024

nicolasnoble Oct 31, 2024

lkomali Nov 5, 2024

nicolasnoble Nov 6, 2024

nicolasnoble Oct 31, 2024

nicolasnoble Oct 31, 2024

lkomali Nov 5, 2024 •

edited

Loading

nicolasnoble Nov 6, 2024

nicolasnoble Oct 31, 2024

lkomali Nov 5, 2024 •

edited

Loading

nicolasnoble Nov 6, 2024

nicolasnoble Oct 31, 2024

lkomali Nov 5, 2024

nicolasnoble Nov 6, 2024

nicolasnoble left a comment

matthewkotila Dec 4, 2024

	namespace triton { namespace perfanalyzer {
	namespace triton::perfanalyzer {

PA: Fixed scheduler manager #88

PA: Fixed scheduler manager #88

Conversation

IzzyPutterman commented Sep 18, 2024 • edited by lkomali Loading

debermudez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewkotila Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IzzyPutterman commented Sep 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lkomali Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lkomali Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicolasnoble left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IzzyPutterman commented Sep 18, 2024 •

edited by lkomali

Loading

matthewkotila Oct 3, 2024 •

edited

Loading

lkomali Nov 5, 2024 •

edited

Loading

lkomali Nov 5, 2024 •

edited

Loading