Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sound/pipewire: add truncated exp backoff to tests and fork them #789

Merged

Conversation

epilys
Copy link
Member

@epilys epilys commented Dec 2, 2024

Sometimes pw backend creation on CI fails with errors like for example:

thread 'audio_backends::pipewire::tests::test_pipewire_backend_invalid_stream' panicked at vhost-device-sound/src/audio_backends/pipewire.rs:98:42:
Failed to connect to core: CreationFailed

Add a simple kind of truncated exponential backoff retry wrapper to
PwBackend::new().

Also, run all tests that modify their process environment under a fork using the rusty_fork crate.

Signed-off-by: Manos Pitsidianakis [email protected]

Depends on #788 which fixes lint warnings

@epilys epilys self-assigned this Dec 2, 2024
@epilys epilys force-pushed the sound-pipewire-retry-flaky-tests branch from 1b54b42 to 480536b Compare December 2, 2024 12:06
@epilys
Copy link
Member Author

epilys commented Dec 2, 2024

@stefano-garzarella can we retry the CI jobs a few times manually to see if they don't hangup indefinitely? (since there is no max try limit in this version of the code)

@stefano-garzarella
Copy link
Member

@stefano-garzarella can we retry the CI jobs a few times manually to see if they don't hangup indefinitely? (since there is no max try limit in this version of the code)

sure, let me restart it!

@stefano-garzarella
Copy link
Member

@epilys is there any log messages I should check if the retry worked? (not yet looked at the PR, sorry)

@epilys
Copy link
Member Author

epilys commented Dec 2, 2024

@stefano-garzarella the CI seems to capture test output by default, so it'd only show up if something failed unfortunately

@stefano-garzarella
Copy link
Member

@stefano-garzarella the CI seems to capture test output by default, so it'd only show up if something failed unfortunately

I see, BTW 3 pipeline rebuilds so far without issues ;-) I'll arrive at least at 10 and report here!

@epilys epilys force-pushed the sound-pipewire-retry-flaky-tests branch from 480536b to d091bd3 Compare December 2, 2024 14:09
@epilys
Copy link
Member Author

epilys commented Dec 2, 2024

Mentioning that this is related to #647, but since we cannot know for sure if this PR fixes it we cannot close it yet at least if this is merged.

@epilys
Copy link
Member Author

epilys commented Dec 2, 2024

Ok so the problem is not solved by retrying, since this job https://buildkite.com/rust-vmm/vhost-device-ci/builds/2868#019387cd-8337-4fc4-a24f-ab6d701d4bb4/46-726 managed to hang:

test audio_backends::pipewire::tests::test_pipewire_backend_invalid_stream has been running for over 60 seconds

Some other idea to debug this (posting them here as notes for the future):

PipewireTestHarness::new spawns a pipewire process, but just assumes it launches correctly. Never does a try_wait on the spawned child. Maybe we should do that and see what the failure is, possible put the retry/backoff approach there.

@stefano-garzarella
Copy link
Member

Ok so the problem is not solved by retrying, since this job buildkite.com/rust-vmm/vhost-device-ci/builds/2868#019387cd-8337-4fc4-a24f-ab6d701d4bb4/46-726 managed to hang:

test audio_backends::pipewire::tests::test_pipewire_backend_invalid_stream has been running for over 60 seconds

Some other idea to debug this (posting them here as notes for the future):

PipewireTestHarness::new spawns a pipewire process, but just assumes it launches correctly. Never does a try_wait on the spawned child. Maybe we should do that and see what the failure is, possible put the retry/backoff approach there.

yeah, this is a good point! I agree on doing that.

@epilys epilys force-pushed the sound-pipewire-retry-flaky-tests branch 6 times, most recently from 6138fdc to 2dc4a76 Compare December 3, 2024 10:50
@epilys
Copy link
Member Author

epilys commented Dec 3, 2024

Ok that's interesting, my latest change to panic if dbus/pipewire process don't exist triggered a new test failure:

     Running unittests src/lib.rs (target/debug/deps/vhost_device_sound-e293b9b9920f5bbe)
running 34 tests
error: test failed, to rerun pass `-p vhost-device-sound --lib`
Caused by:
  process didn't exit successfully: `/workdir/target/debug/deps/vhost_device_sound-e293b9b9920f5bbe` (signal: 11, SIGSEGV: invalid memory reference)

I'm beginning to suspect tests running in threads is at play here; setting environment variables is not thread safe. Perhaps this could be the source of errors... Maybe all tests using alsa/pipewire stuff should be run with the sealed_test crate.

@epilys epilys force-pushed the sound-pipewire-retry-flaky-tests branch 10 times, most recently from 2f43e03 to dd41a0c Compare December 3, 2024 11:56
@epilys epilys force-pushed the sound-pipewire-retry-flaky-tests branch 4 times, most recently from ecaa4f1 to d8e7224 Compare December 3, 2024 14:12
@stefano-garzarella
Copy link
Member

Ok that's interesting, my latest change to panic if dbus/pipewire process don't exist triggered a new test failure:

     Running unittests src/lib.rs (target/debug/deps/vhost_device_sound-e293b9b9920f5bbe)
running 34 tests
error: test failed, to rerun pass `-p vhost-device-sound --lib`
Caused by:
  process didn't exit successfully: `/workdir/target/debug/deps/vhost_device_sound-e293b9b9920f5bbe` (signal: 11, SIGSEGV: invalid memory reference)

I'm beginning to suspect tests running in threads is at play here; setting environment variables is not thread safe.

Goooood point!

Perhaps this could be the source of errors... Maybe all tests using alsa/pipewire stuff should be run with the sealed_test crate.

Cool, this can help. Can we also have other concurrency issues?
I had an issue with vhost-vdpa devices, where the only solution was to run tests sequentially to avoid issues (in that case a /dev/vhost-vdpa-0 device can be opened only by 1 thread).

@epilys epilys force-pushed the sound-pipewire-retry-flaky-tests branch from d8e7224 to ce07529 Compare December 4, 2024 09:54
@epilys epilys marked this pull request as ready for review December 4, 2024 09:54
@epilys epilys changed the title sound/pipewire: add truncated exp backoff to tests sound/pipewire: add truncated exp backoff to tests and fork them Dec 4, 2024
@epilys
Copy link
Member Author

epilys commented Dec 4, 2024

Updated to run all process environment-modifying tests under forks using the rusty-fork crate like it's done in the vhost-device-gpu crate tests in staging.

@epilys
Copy link
Member Author

epilys commented Dec 4, 2024

@stefano-garzarella

Can we also have other concurrency issues?
I had an issue with vhost-vdpa devices, where the only solution was to run tests sequentially to avoid issues (in that case a /dev/vhost-vdpa-0 device can be opened only by 1 thread).

Not sure, we're not dealing with any /dev/ device files (I think). For alsa at least, the test harness uses null devices.

@epilys epilys force-pushed the sound-pipewire-retry-flaky-tests branch 2 times, most recently from dbad659 to 3dec5a3 Compare December 6, 2024 12:25
Change Error::UnexpectedAudioBackendError variant to take a boxed `dyn
Error` value instead of a String. In the following commits we will use
the error variant for errors from the pipewire backend as well.

Signed-off-by: Manos Pitsidianakis <[email protected]>
Add a pipewire-specific error enum and make the backend's new() method
return a Result instead of panicking with expect() on error.

Signed-off-by: Manos Pitsidianakis <[email protected]>
@epilys epilys force-pushed the sound-pipewire-retry-flaky-tests branch from 3dec5a3 to 941f882 Compare December 6, 2024 12:25
Copy link
Member

@stefano-garzarella stefano-garzarella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I just saw some really minor things, feel free to ignore them.

Sometimes pw backend creation on CI fails with errors like for example:

    thread 'audio_backends::pipewire::tests::test_pipewire_backend_invalid_stream' panicked at vhost-device-sound/src/audio_backends/pipewire.rs:98:42:
    Failed to connect to core: CreationFailed

Add a simple kind of truncated exponential backoff retry wrapper to
PwBackend::new().

Signed-off-by: Manos Pitsidianakis <[email protected]>
If dbus-daemon or pipewire daemon exit with an error, panic.

This might help us debug CI failures down the road, hopefully.

This commit also changes the `println!`s to `eprintln!`s since the
messages purpose is better suited to stderr.

Signed-off-by: Manos Pitsidianakis <[email protected]>
Alsa and Pipewire tests modify their process environment (as part of
setting up their test harnesses), which is incorrect because tests run
on a multi-thread test runner by default. Use the rusty_fork crate to
run those tests in a forked process.

Signed-off-by: Manos Pitsidianakis <[email protected]>
@stefano-garzarella stefano-garzarella merged commit 14a062c into rust-vmm:main Dec 9, 2024
2 checks passed
@epilys epilys deleted the sound-pipewire-retry-flaky-tests branch December 9, 2024 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants