-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APR-248] rearrange things to make sure we intern metadata strings #337
Conversation
Regression Detector (DogStatsD)Regression Detector ResultsRun ID: 2e655b81-0bd2-4256-bd38-7b6830f3d22a Baseline: 7.59.0 Optimization Goals: ✅ No significant changes detected
|
perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
---|---|---|---|---|---|---|
➖ | dsd_uds_100mb_3k_contexts_distributions_only | memory utilization | +0.69 | [+0.51, +0.87] | 1 | |
➖ | dsd_uds_500mb_3k_contexts | ingress throughput | +0.03 | [+0.02, +0.05] | 1 | |
➖ | dsd_uds_10mb_3k_contexts | ingress throughput | +0.01 | [-0.00, +0.03] | 1 | |
➖ | dsd_uds_1mb_50k_contexts_memlimit | ingress throughput | +0.00 | [-0.00, +0.00] | 1 | |
➖ | dsd_uds_1mb_3k_contexts_dualship | ingress throughput | +0.00 | [-0.00, +0.00] | 1 | |
➖ | dsd_uds_100mb_250k_contexts | ingress throughput | +0.00 | [-0.00, +0.00] | 1 | |
➖ | dsd_uds_1mb_50k_contexts | ingress throughput | -0.00 | [-0.00, +0.00] | 1 | |
➖ | dsd_uds_512kb_3k_contexts | ingress throughput | -0.00 | [-0.01, +0.01] | 1 | |
➖ | dsd_uds_1mb_3k_contexts | ingress throughput | -0.00 | [-0.00, +0.00] | 1 | |
➖ | dsd_uds_100mb_3k_contexts | ingress throughput | -0.00 | [-0.05, +0.04] | 1 | |
➖ | quality_gates_idle_rss | memory utilization | -1.33 | [-1.46, -1.21] | 1 |
Bounds Checks: ❌ Failed
perf | experiment | bounds_check_name | replicates_passed | links |
---|---|---|---|---|
❌ | quality_gates_idle_rss | memory_usage | 0/10 |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Regression Detector (Saluki)Regression Detector ResultsRun ID: 738c41c5-81e6-4240-b7f2-b6b756f7abdf Baseline: ec30b3d Optimization Goals: ❌ Significant changes detected
|
perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
---|---|---|---|---|---|---|
➖ | dsd_uds_50mb_10k_contexts_no_inlining | ingress throughput | +0.01 | [-0.07, +0.08] | 1 | |
➖ | dsd_uds_512kb_3k_contexts | ingress throughput | +0.00 | [-0.01, +0.01] | 1 | |
➖ | dsd_uds_1mb_50k_contexts | ingress throughput | +0.00 | [-0.02, +0.02] | 1 | |
➖ | dsd_uds_50mb_10k_contexts_no_inlining_no_allocs | ingress throughput | -0.00 | [-0.06, +0.06] | 1 | |
➖ | dsd_uds_100mb_3k_contexts | ingress throughput | -0.00 | [-0.06, +0.05] | 1 | |
➖ | dsd_uds_1mb_3k_contexts_dualship | ingress throughput | -0.00 | [-0.01, +0.00] | 1 | |
➖ | dsd_uds_100mb_250k_contexts | ingress throughput | -0.01 | [-0.04, +0.02] | 1 | |
➖ | dsd_uds_1mb_3k_contexts | ingress throughput | -0.02 | [-0.03, +0.00] | 1 | |
➖ | dsd_uds_10mb_3k_contexts | ingress throughput | -0.03 | [-0.06, +0.01] | 1 | |
➖ | dsd_uds_500mb_3k_contexts | ingress throughput | -0.25 | [-0.36, -0.14] | 1 | |
➖ | dsd_uds_1mb_50k_contexts_memlimit | ingress throughput | -2.57 | [-4.41, -0.73] | 1 | |
➖ | dsd_uds_100mb_3k_contexts_distributions_only | memory utilization | -4.17 | [-4.43, -3.90] | 1 | |
✅ | quality_gates_idle_rss | memory utilization | -19.33 | [-19.69, -18.96] | 1 |
Bounds Checks: ✅ Passed
perf | experiment | bounds_check_name | replicates_passed | links |
---|---|---|---|---|
✅ | quality_gates_idle_rss | memory_usage | 10/10 |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Regression Detector LinksExperiment Result Links
|
self.0.split_once(':').map(|(_, value)| value) | ||
} | ||
|
||
/// Gets the name and value of the tag. | ||
/// | ||
/// For bare tags (e.g. `production`), this always returns `(Some(...), None)`. | ||
pub fn name_and_value(&self) -> (Option<&str>, Option<&str>) { | ||
pub fn name_and_value(&self) -> (&'a str, Option<&'a str>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
I agree that the parser really shouldn't care except for knowing how to adjust the raw input payload in order to exercise those codepaths. I do think the tests are valuable overall: for example, we have all the various permutations because of a real bug I caught where I broke how optional fields are parsed which didn't show up unless the optional fields were in a certain order. I'm ambivalent to where you place the tests, but it does feel like we need to keep them in spirit to avoid regressing the codec.
I'm guessing I made this optional trying to futureproof for a scenario where we weren't deal with Agent metrics. I'm fine with making it required until that need actually arises.
This was a hack to to have
This is a good point. My brain personally enjoys builder APIs, but I'm not strongly tied to them. A middleground could be that we just make them more restricted by requiring My inclination would be to have a builder API if we do end up with a good number of optional fields to avoid massive, multi-line struct initializers... that's it, really. Overall, this approach seems fine to me. 👍🏻 |
Signed-off-by: Luke Steensen <[email protected]>
6bf270a
to
92668d0
Compare
Signed-off-by: Luke Steensen <[email protected]>
Signed-off-by: Luke Steensen <[email protected]>
Signed-off-by: Luke Steensen <[email protected]>
Signed-off-by: Luke Steensen <[email protected]>
Signed-off-by: Luke Steensen <[email protected]>
Signed-off-by: Luke Steensen <[email protected]>
Signed-off-by: Luke Steensen <[email protected]>
Strongly agree! I wouldn't get rid of any of the tests, was just trying to figure out a good way to split out the dirty external stuff (i.e. connection address) from the pure parsing concerns and organize the tests accordingly. But that's something we can maybe follow up on.
Not a big deal, this makes sense.
Noticed this regression and backed out the change.
I'm good with that too, I just think the current implementation via setters/getters ends up encouraging/allowing mutation in a way that's limiting for sharing metadata, for example. Just having separate builder structs would fix that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I think what we might want to consider in the future is simply changing the DSD codec tests to deal with MetricPacket<'a>
instead of Metric
when parsing.
As nice as it would be to just have codecs only return Event
/Event
-compatible types, that's clearly not the way to approach things when we need to obsess over performance/efficiency, so we can probably afford to be more sparse in the tests just to avoid polluting them with too much setup/boilerplate.
Nothing you need to do here unless you really want to.
This is a bit rough around the edges still, but opening to get feedback. Things that are currently awkward:
process_id
,origin
, etc. This means we're either making up data to fill the tests or duplicating logic from the source.None
(maybe an optimization? It can be put back).set_container_id
) very effectively hid places where we should have been interning strings viaInto<MetaString>
, which is a bit of a performance footgun (arguably correctness for bounding as well).pub
fields, but it was the easiest way to get this working for now.None of this should be terribly difficult to sort out, and there's room for some nice simplification once we do, but wanted to make sure folks are on the same page before making any real design decisions.