Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test failure: omicron-omdb::test_all_output test_omdb_success_cases #6509

Closed
smklein opened this issue Sep 3, 2024 · 2 comments · Fixed by #6511
Closed

test failure: omicron-omdb::test_all_output test_omdb_success_cases #6509

smklein opened this issue Sep 3, 2024 · 2 comments · Fixed by #6511
Labels
Test Flake Tests that work. Wait, no. Actually yes. Hang on. Something is broken.

Comments

@smklein
Copy link
Collaborator

smklein commented Sep 3, 2024

On branch main commit 5bf5f09, I saw this test failure:

running commands with args: ["nexus", "blueprints", "show", "b1e4b4bd-a186-416b-99ed-63b8d88d7b89"]
running commands with args: ["nexus", "blueprints", "show", "current-target"]
running commands with args: ["nexus", "blueprints", "diff", "b1e4b4bd-a186-416b-99ed-63b8d88d7b89", "current-target"]
@@ -563,11 +563,11 @@
 task: "metrics_producer_gc"
   configured period: every 1m
   currently executing: no
   last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
     started at <REDACTED     TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
-warning: unknown background task: "metrics_producer_gc" (don't know how to interpret details: Object {"expiration": String("<REDACTED           TIMESTAMP>"), "pruned": Array []})
+warning: unknown background task: "metrics_producer_gc" (don't know how to interpret details: Object {"expiration": String("<REDACTED        TIMESTAMP>"), "pruned": Array []})
 
 task: "phantom_disks"
   configured period: every <REDACTED_DURATION>s
   currently executing: no
   last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
@@ -990,11 +990,11 @@
 task: "metrics_producer_gc"
   configured period: every 1m
   currently executing: no
   last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing
     started at <REDACTED     TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
-warning: unknown background task: "metrics_producer_gc" (don't know how to interpret details: Object {"expiration": String("<REDACTED           TIMESTAMP>"), "pruned": Array []})
+warning: unknown background task: "metrics_producer_gc" (don't know how to interpret details: Object {"expiration": String("<REDACTED        TIMESTAMP>"), "pruned": Array []})
 
 task: "phantom_disks"
   configured period: every <REDACTED_DURATION>s
   currently executing: no
   last completed activation: <REDACTED ITERATIONS>, triggered by a periodic timer firing

test test_omdb_success_cases ... FAILED

failures:

failures:
    test_omdb_success_cases

I was actually trying to reproduce #6505 locally, but this looks like a different error. Notably, I'm seeing differences in the whitespace between REDACTED...TIMESTAMP.

@smklein smklein added the Test Flake Tests that work. Wait, no. Actually yes. Hang on. Something is broken. label Sep 3, 2024
@smklein
Copy link
Collaborator Author

smklein commented Sep 3, 2024

Other notes: This was on Helios, after running cargo nt -p omicron-omdb -- test_omdb_success_cases in a loop for almost 100 iterations.

@davepacheco
Copy link
Collaborator

It looks like the code that does the timestamp redactions attempts to preserve the width (in characters) of the timestamp:

for _ in 0..(m.len() - "<REDACTEDTIMESTAMP>".len()) {
new_s.push(' ');
}

It sounds like most of the time, the width matches what's in the expected output file. In this case, it was 3 characters shorter. In the past I've seen that some ways of formatting timestamps implicitly vary the precision based on the value -- e.g., if the milliseconds happens to be exactly 0, then they get left out.

On a regular run I saw this value:

warning: unknown background task: "metrics_producer_gc" (don't know how to interpret details: Object {"expiration": String("2024-09-03T17:29:27.850928397Z"), "pruned": Array []})

So (1) we're doing the string formatting on the server, and (2) my guess is that the nanoseconds component was 0 in the case we saw and the serialization just left it out.

smklein added a commit that referenced this issue Sep 3, 2024
#6511)

Changes our "timestamp redaction" logic to ignore the original length of
the input timestamp.

Timestamps are not always constant-width -- depending on the amount of
trailing zeros, they can have variable stringified lengths.
Unfortunately, this variability is at-odds with our goal of
deterministic test output.

In this PR, we favor the deterministic output, to avoid test flaking:
Variable space output logic is removed in favor of a stable redaction
marker. Additionally, tests with different timestamp lengths are added
to act as a regression against this particular flake.

Fixes #6509
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Test Flake Tests that work. Wait, no. Actually yes. Hang on. Something is broken.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants