-
Notifications
You must be signed in to change notification settings - Fork 45
Feat/max error rate #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feat/max error rate #171
Changes from all commits
b7638b0
6059af1
69a5c9e
7795d2c
6d688f0
ede651a
a17117c
34cb6b6
6289c07
d5ee018
ce13ef7
9a68a76
6dd313d
3697b30
2fe64c7
b54ab14
b502c94
332ef08
c2fd813
4bda8cf
4b857f1
0d89a39
26319a5
09925a4
fa56258
2889dce
3361d2f
c134f66
464ebe3
883593a
35abac7
55cf718
1bc8f9a
9945710
f11da24
6c6c15a
039db66
640744c
5783d62
85cb24d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -147,6 +147,8 @@ The `guidellm benchmark` command is used to run benchmarks against a generative | |||||
|
||||||
- `--max-requests`: Sets the maximum number of requests for each benchmark run. If not provided, the benchmark will run until `--max-seconds` is reached or the dataset is exhausted. | ||||||
|
||||||
- `--max-error-rate`: The maximum error rate after which a benchmark will stop. Applicable only for finite deterministic scenarios i.e `rate_type` is `constant` and `--max-seconds` exists OR `--max-requests` exists OR the dataset is finite. If `--max-error-rate` is `None` or not applicable, benchmarks will continue regardless of error rate. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
- `--warmup-percent`: Specifies the percentage of the benchmark to treat as a warmup phase. Requests during this phase are excluded from the final results. | ||||||
|
||||||
- `--cooldown-percent`: Specifies the percentage of the benchmark to treat as a cooldown phase. Requests during this phase are excluded from the final results. | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -90,6 +90,9 @@ class BenchmarkArgs(StandardBaseModel): | |
max_duration: Optional[float] = Field( | ||
description="The maximum duration in seconds to run this benchmark, if any." | ||
) | ||
max_error: Optional[float] = Field( | ||
description="Maximum error rate or const after which a benchmark will stop." | ||
) | ||
warmup_number: Optional[int] = Field( | ||
description=( | ||
"The number of requests to run for the warmup phase of this benchmark, " | ||
|
@@ -213,6 +216,15 @@ class BenchmarkRunStats(StandardBaseModel): | |
"it was completed." | ||
) | ||
) | ||
error_rate: float = Field( | ||
description=( | ||
"The number of errored requests divided by the number " | ||
"of successful and errored requests. " | ||
"This can be higher than max_error " | ||
"(if applicable) cause it does not take into " | ||
"account incomplete requests." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not following this point, the error rate we calculate and compare to should never include incomplete, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't max_error also not looking at incomplete requests, though? How would it be higher for error_rate vs max_error since those should be based off the same calculations, right? |
||
) | ||
) | ||
|
||
|
||
class BenchmarkMetrics(StandardBaseModel): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,11 +19,16 @@ | |
__all__ = [ | ||
"GenerativeRequestLoader", | ||
"GenerativeRequestLoaderDescription", | ||
"GetInfiniteDatasetLengthError", | ||
"RequestLoader", | ||
"RequestLoaderDescription", | ||
] | ||
|
||
|
||
class GetInfiniteDatasetLengthError(Exception): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this exception and logic is no longer needed since if we can get the length then we go through a specific logic route and if we can't, then we fallback on the window |
||
pass | ||
|
||
|
||
class RequestLoaderDescription(StandardBaseModel): | ||
type_: Literal["request_loader"] = "request_loader" | ||
|
||
|
@@ -120,7 +125,11 @@ def __len__(self) -> int: | |
if self.iter_type == "finite": | ||
return self.num_unique_items() | ||
|
||
raise ValueError(f"Unable to determine length of dataset: {self.data}") | ||
if self.iter_type != "infinite": | ||
raise ValueError(f"Invalid iter_type {self.iter_type}") | ||
raise GetInfiniteDatasetLengthError( | ||
f"Dataset {self.data} is infinite and thus unable to determine length" | ||
) | ||
|
||
@property | ||
def description(self) -> GenerativeRequestLoaderDescription: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
from collections import deque | ||
from typing import ( | ||
Generic, | ||
Literal, | ||
|
@@ -16,6 +17,9 @@ | |
] | ||
|
||
|
||
RequestStatus = Literal["success", "error"] | ||
|
||
|
||
class SchedulerRunInfo(StandardBaseModel): | ||
""" | ||
Information about the current run of the scheduler. | ||
|
@@ -46,12 +50,15 @@ class SchedulerRunInfo(StandardBaseModel): | |
end_number: float | ||
processes: int | ||
strategy: SchedulingStrategy | ||
last_requests_statuses: deque[RequestStatus] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can simplify the logic here so we don't need to add this in especially since this list can be come very long and it is not json serializable so would break any pydantic serialization in the event someone wants to save that state. See note later down where that logic is located |
||
max_error: Optional[float] = None | ||
|
||
created_requests: int = 0 | ||
queued_requests: int = 0 | ||
scheduled_requests: int = 0 | ||
processing_requests: int = 0 | ||
completed_requests: int = 0 | ||
errored_requests: int = 0 | ||
|
||
|
||
class SchedulerRequestInfo(StandardBaseModel): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need these reenabled. If you ever need to push and skip recommit, you can pass in the --no-verify flag with your git commit command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @sjmonson just landed a diff that might clear up some of the issues you were seeing