Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test failed in CI: test_oximeter_reregistration #6901

Open
jgallagher opened this issue Oct 18, 2024 · 0 comments
Open

test failed in CI: test_oximeter_reregistration #6901

jgallagher opened this issue Oct 18, 2024 · 0 comments
Labels
Test Flake Tests that work. Wait, no. Actually yes. Hang on. Something is broken.

Comments

@jgallagher
Copy link
Contributor

This test failed on a CI run on #6794:

https://github.com/oxidecomputer/omicron/pull/6794/checks?check_run_id=31733911976

Log showing the specific test failure:

https://buildomat.eng.oxide.computer/wg/0/details/01JAFWKR0V962B2KXK9WB257T0/QuKWDAzMMp3WYt2ahovphvgwOc1gLSM4cCwVzhoqVoR02tm8/01JAFWM1JM94KRNCYGY9J8ZMX7

Excerpt from the log showing the failure:

6316	2024-10-18T13:58:54.822Z	        FAIL [  72.116s] omicron-nexus::test_all integration_tests::oximeter::test_oximeter_reregistration
6317	2024-10-18T13:58:54.822Z	
6318	2024-10-18T13:58:54.822Z	--- STDOUT:              omicron-nexus::test_all integration_tests::oximeter::test_oximeter_reregistration ---
6319	2024-10-18T13:58:54.822Z	
6320	2024-10-18T13:58:54.822Z	running 1 test
6321	2024-10-18T13:58:54.822Z	test integration_tests::oximeter::test_oximeter_reregistration has been running for over 60 seconds
6322	2024-10-18T13:58:54.822Z	test integration_tests::oximeter::test_oximeter_reregistration ... FAILED
6323	2024-10-18T13:58:54.822Z	
6324	2024-10-18T13:58:54.822Z	failures:
6325	2024-10-18T13:58:54.822Z	
6326	2024-10-18T13:58:54.822Z	failures:
6327	2024-10-18T13:58:54.822Z	    integration_tests::oximeter::test_oximeter_reregistration
6328	2024-10-18T13:58:54.822Z	
6329	2024-10-18T13:58:54.822Z	test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 381 filtered out; finished in 72.07s
6330	2024-10-18T13:58:54.822Z	
6331	2024-10-18T13:58:54.823Z	
6332	2024-10-18T13:58:54.823Z	--- STDERR:              omicron-nexus::test_all integration_tests::oximeter::test_oximeter_reregistration ---
6333	2024-10-18T13:58:54.823Z	log file: /var/tmp/omicron_tmp/test_all-17f1d182e85394d2-test_oximeter_reregistration.151962.0.log
6334	2024-10-18T13:58:54.823Z	note: configured to log to "/var/tmp/omicron_tmp/test_all-17f1d182e85394d2-test_oximeter_reregistration.151962.0.log"
6335	2024-10-18T13:58:54.823Z	DB URL: postgresql://root@[::1]:40485/omicron?sslmode=disable
6336	2024-10-18T13:58:54.823Z	DB address: [::1]:40485
6337	2024-10-18T13:58:54.823Z	log file: /var/tmp/omicron_tmp/test_all-17f1d182e85394d2-test_oximeter_reregistration.151962.2.log
6338	2024-10-18T13:58:54.823Z	note: configured to log to "/var/tmp/omicron_tmp/test_all-17f1d182e85394d2-test_oximeter_reregistration.151962.2.log"
6339	2024-10-18T13:58:54.823Z	log file: /var/tmp/omicron_tmp/test_all-17f1d182e85394d2-test_oximeter_reregistration.151962.3.log
6340	2024-10-18T13:58:54.823Z	note: configured to log to "/var/tmp/omicron_tmp/test_all-17f1d182e85394d2-test_oximeter_reregistration.151962.3.log"
6341	2024-10-18T13:58:54.823Z	thread 'integration_tests::oximeter::test_oximeter_reregistration' panicked at nexus/tests/integration_tests/oximeter.rs:163:14:
6342	2024-10-18T13:58:54.823Z	Failed to retrieve timeseries: TimedOut(60.026376224s)

I strongly suspect this has the same underlying cause as #6895, based on seeing a pruned Oximeter collector in https://buildomat.eng.oxide.computer/wg/0/artefact/01JAFWKR0V962B2KXK9WB257T0/QuKWDAzMMp3WYt2ahovphvgwOc1gLSM4cCwVzhoqVoR02tm8/01JAFWM1JM94KRNCYGY9J8ZMX7/01JAFZZESJF2HPMY867ENBDNQ4/test_all-17f1d182e85394d2-test_oximeter_reregistration.151962.0.log?format=x-bunyan:

2024-10-18T13:57:54.093Z	INFO	test_oximeter_reregistration (oximeter-agent): refreshed list of producers from Nexus
    collector_id = 39e6175b-4df2-4730-b11d-cbc1e60a2e78
    collector_ip = ::1
    file = oximeter/collector/src/agent.rs:809
    n_current_tasks = 1
    n_pruned_tasks = 1
@jgallagher jgallagher added the Test Flake Tests that work. Wait, no. Actually yes. Hang on. Something is broken. label Oct 18, 2024
bnaecker added a commit that referenced this issue Oct 18, 2024
- Add generation numbers to the collection of oximeter producers, and
  assign the currrent generation to each producer as it is registered.
- Modify the refresh method to first take the generation number before
  starting to list current producers. Then use that to avoid pruning
  producers that are _new_ since we started refreshing our list.
- Fixes #6895 and possibly #6901
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Test Flake Tests that work. Wait, no. Actually yes. Hang on. Something is broken.
Projects
None yet
Development

No branches or pull requests

1 participant