-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python Check Support #277
base: main
Are you sure you want to change the base?
Python Check Support #277
Conversation
Regression Detector (DogStatsD)Regression Detector ResultsRun ID: 93705546-53b9-4fc8-aefb-5d50856b32bd Baseline: 7.55.2 Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
---|---|---|---|---|---|---|
➖ | dsd_uds_1mb_50k_contexts_memlimit | ingress throughput | +0.05 | [+0.01, +0.09] | 1 | |
➖ | dsd_uds_1mb_3k_contexts | ingress throughput | +0.02 | [-0.01, +0.04] | 1 | |
➖ | dsd_uds_512kb_3k_contexts | ingress throughput | +0.02 | [-0.03, +0.07] | 1 | |
➖ | dsd_uds_10mb_3k_contexts | ingress throughput | +0.01 | [-0.01, +0.03] | 1 | |
➖ | dsd_uds_100mb_3k_contexts | ingress throughput | +0.00 | [-0.02, +0.02] | 1 | |
➖ | dsd_uds_1mb_50k_contexts | ingress throughput | -0.00 | [-0.00, +0.00] | 1 | |
➖ | dsd_uds_100mb_250k_contexts | ingress throughput | -0.00 | [-0.04, +0.03] | 1 | |
➖ | dsd_uds_500mb_3k_contexts | ingress throughput | -0.00 | [-0.01, +0.00] | 1 | |
➖ | dsd_uds_100mb_3k_contexts_distributions_only | memory utilization | -0.78 | [-0.94, -0.61] | 1 |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
… build-image steps
c0c490e
to
f5b5ce3
Compare
…d state of each incoming check
…ar reasons related to new build-image
|
||
individual_tags = [] | ||
def generate_tag(tag_length): | ||
if rng.random() >= unique_tagset_ratio and len(individual_tags) != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Code Vulnerability
do not use random (...read more)
Make sure to use values that are actually random. The random
module in Python should generally not be used and replaced with the secrets
module, as noted in the official Python documentation.
Learn More
# For each metric that gets submitted, choose a tagset at random | ||
# This will average out to | ||
# contexts = len(tag_sets) as long as num_metrics is greater than num_tagsets | ||
self.gauge('hello.world', rng.random() * 1000, tags=rng.choice(tag_sets)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Code Vulnerability
do not use random (...read more)
Make sure to use values that are actually random. The random
module in Python should generally not be used and replaced with the secrets
module, as noted in the official Python documentation.
Learn More
Regression Detector (Saluki)Regression Detector ResultsRun ID: cb467c25-16cf-447c-b61d-9fb9b58a954e Baseline: 95d7b85 Performance changes are noted in the perf column of each table:
Significant changes in experiment optimization goalsConfidence level: 90.00%
|
perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
---|---|---|---|---|---|---|
❌ | pycheck_lots_of_tags | % cpu utilization | +4674.28 | [+4508.71, +4839.85] | 1 |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
match module.getattr("SUBMISSION_QUEUE") { | ||
Ok(py_item) => match py_item.extract::<Py<python_scheduler::PythonSenderHolder>>() { | ||
Ok(q) => { | ||
let res = pyo3::Python::with_gil(|py| q.bind_borrowed(py).borrow_mut().sender.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, I don't think the gil needs to be acquired here - see https://pyo3.rs/main/performance#access-to-bound-implies-access-to-gil-token
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, you're right, that lets me get rid of this sender.clone
as well, good spot!
Regression Detector LinksExperiment Result Links
|
Continuation of #48
To get started:
Another option is to use the 'converged' image which runs ADP inside the standard docker image, this will attempt to run all default checks out-of-the-box, which is useful for testing.