You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Due to jittery nature of perf testing we usually get some failures that are only outliers but they still result in compareperf to return non-zero output. It'd be nice to define conditions under which we consider a comparison as good. This would be very useful for build reporting purposes as well as for bisection.
One way to judge would be to specify a percentage of acceptable failures (in total, per group, ...). Another would be to focus on reference builds and be more lenient on builds with a great jittery while failing in case of a stable failure rate. Of course we can add multiple metrics to allow the operators to define better rules.
Alternatively we can just report some aggregated info and let the users to process them afterwards, but a better handling would still be IMO useful.
Bonus: It'd be nice to allow ML identification and models support for deciding on the build status, but that is currently out of scope.
The text was updated successfully, but these errors were encountered:
Due to jittery nature of perf testing we usually get some failures that are only outliers but they still result in compareperf to return non-zero output. It'd be nice to define conditions under which we consider a comparison as good. This would be very useful for build reporting purposes as well as for bisection.
One way to judge would be to specify a percentage of acceptable failures (in total, per group, ...). Another would be to focus on reference builds and be more lenient on builds with a great jittery while failing in case of a stable failure rate. Of course we can add multiple metrics to allow the operators to define better rules.
Alternatively we can just report some aggregated info and let the users to process them afterwards, but a better handling would still be IMO useful.
Bonus: It'd be nice to allow ML identification and models support for deciding on the build status, but that is currently out of scope.
The text was updated successfully, but these errors were encountered: