Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goodput initial implementation #32

Merged
merged 31 commits into from
Aug 23, 2024
Merged

Goodput initial implementation #32

merged 31 commits into from
Aug 23, 2024

Conversation

AndyDai-nv
Copy link
Contributor

Migrate goodput dev branch to this new repo.

Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start, Andy! Left a few comments. If you have any questions, let me know.

genai-perf/docs/goodput_tutorial.md Outdated Show resolved Hide resolved
genai-perf/genai_perf/metrics/llm_metrics.py Outdated Show resolved Hide resolved
genai-perf/genai_perf/metrics/statistics.py Outdated Show resolved Hide resolved
genai-perf/genai_perf/parser.py Show resolved Hide resolved
genai-perf/tests/test_llm_metrics.py Outdated Show resolved Hide resolved
Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work, Andy! This POC looks good. I added some small comments as I reviewed to update once the core development is done.

What are the next steps? We can also chat offline. I think it'd be great to get unit testing up for this.

Once you finish doing this work for LLMs, we can look at creating a class for non-LLMs (specifically embeddings/rankings). With the way you coded this, I don't see anything special about the llm_goodput_reporter that couldn't be used for non-LLM models. You may want to try using it with embeddings/rankings and seeing if it works.

genai-perf/genai_perf/parser.py Outdated Show resolved Hide resolved
genai-perf/genai_perf/parser.py Outdated Show resolved Hide resolved
genai-perf/genai_perf/parser.py Outdated Show resolved Hide resolved
genai-perf/docs/goodput_tutorial.md Outdated Show resolved Hide resolved
genai-perf/genai_perf/goodput_reporter/goodput_reporter.py Outdated Show resolved Hide resolved
genai-perf/genai_perf/goodput_reporter/goodput_reporter.py Outdated Show resolved Hide resolved
genai-perf/genai_perf/goodput_reporter/goodput_reporter.py Outdated Show resolved Hide resolved
for val, slo in zip(request_metric_values, target_metric_values)
):
good_req_count += 1
self._good_req_count = good_req_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid setting return values that will be used later as an attribute within the method. This is very confusing from the reader's perspective because the "count" is hiding inside the method, and it's being used under compute_goodput.

Let's return the counts

def count_good_reqs(self) -> int:
    ...
    return good_req_count

and same goes for other methods as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what is called a side effect.
In general, we want to avoid those if they are not necessary.
I agree with @nv-hwoo here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 77 to 79
if all(val < slo
for val, slo in zip(request_metric_values, target_metric_values)
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we unroll this? It hurts readability and I don't think there's much gain in fitting this all into the if conditional block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Readability and Maintainability are the top priority. We optimize only if we find a bottleneck impacting performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

@debermudez debermudez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of good work in this PR.
Lets some comments but great job.

Parse and check goodput args
"""
'''
if args.goodput:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is functionally a duplication of the code below.
Comments should steer towards information that is not in the code base. The code should be readable enough alongside a meaningful method name, to avoid this level of detail.
It also has the downside of being able to quickly grow stale and become misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -733,6 +778,7 @@ def _parse_profile_args(subparsers) -> argparse.ArgumentParser:
_add_profile_args(profile)
_add_output_args(profile)
_add_other_args(profile)
_add_goodput_args(profile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we foresee more arguments that are going to fall under the goodput general topic?
I am unclear on whether we need to add a new group for this one argument.

Copy link
Contributor Author

@AndyDai-nv AndyDai-nv Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more options for users to see exactly what metrics contribute most to the low goodput? Something like this. Knowing a goodput number might not be enough information, right?
Just my random thoughts: )

genai-perf/genai_perf/parser.py Show resolved Hide resolved
genai-perf/tests/test_llm_metrics.py Outdated Show resolved Hide resolved

docker run -it --net=host --gpus=1 nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

# Run GenAI-Perf in the container:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove any options that are just stating the default values here. Less code to maintain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


@abstractmethod
def compute_goodput(self) -> None:
"""Compute the goodput. To be implemented by subclasses."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above



class LLMGoodputReporter(GoodputReporter):
"""A subclass to report goodput for language models."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know its a subclass from the line above.

Comment on lines 77 to 79
if all(val < slo
for val, slo in zip(request_metric_values, target_metric_values)
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Readability and Maintainability are the top priority. We optimize only if we find a bottleneck impacting performance.

for val, slo in zip(request_metric_values, target_metric_values)
):
good_req_count += 1
self._good_req_count = good_req_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what is called a side effect.
In general, we want to avoid those if they are not necessary.
I agree with @nv-hwoo here.

Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments. Will finish the review soon.

Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment.

This looks pretty close to being ready to merge. Great work, Andy!

genai-perf/tests/test_llm_profile_data_parser.py Outdated Show resolved Hide resolved
Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work, Andy! Looks good to me. 🚀

Please make sure to confirm the last CI pipeline passed all tests and that running with goodput constraints manually still works for LLM and non-LLM models (since there is no CI test for it yet). Assuming that is all still good, feel free to merge.

@AndyDai-nv AndyDai-nv merged commit 68968e3 into main Aug 23, 2024
8 checks passed
@AndyDai-nv AndyDai-nv deleted the andy-goodput-dev branch August 23, 2024 23:25
lkomali pushed a commit that referenced this pull request Aug 27, 2024
GenAI-Perf goodput support implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants