blockdev: implement signal-safe loopback device cleanup helper #1449

gursewak1997 · 2025-07-24T20:21:41Z

Adds a signal-safe loopback device cleanup mechanism for bootc install --via-loopback to prevent device leaks on interruption (e.g., SIGINT). It introduces a loopback-cleanup-helper subcommand and a robust helper binary path resolution strategy to ensure reliable spawning across environments. The update improves error handling, logging, and includes comprehensive tests.
Fixes: #799

Add fork+exec based cleanup helper to prevent loopback device leaks when bootc install --via-loopback is interrupted by signals like SIGINT. - Add loopback-cleanup-helper CLI subcommand - Implement run_loopback_cleanup_helper() with PR_SET_PDEATHSIG - Update LoopbackDevice to spawn cleanup helper process - Add tests for spawn mechanism

gemini-code-assist

Code Review

This pull request introduces a signal-safe cleanup mechanism for loopback devices, which is a solid improvement for robustness. It uses a helper process that listens for parent death and cleans up resources. The implementation is mostly sound, with a robust strategy for finding the helper binary. I've made a couple of suggestions to improve logging in the helper process for better debuggability and to clarify a comment about signal handling.

crates/blockdev/src/blockdev.rs

cgwalters

One thing I like to in situations like this to aid testing is to add an "internals" wrapper that directly calls this code. See an example of this in b696395

So we could add e.g. bootc internals allocate-cleanup-loopback or something which would make it easy to exercise the code without doing a full install run.

cgwalters · 2025-07-24T20:42:09Z

crates/blockdev/src/blockdev.rs

            ])
-            .arg(path)
-            .run_get_string()?;


I think the AI tooling doesn't know about our helper run_get_string() which handles the .success() and error printing.

IOW this code was probably fine as is, can we just keep it that way please?

cgwalters · 2025-07-24T20:45:26Z

crates/blockdev/src/blockdev.rs

+    /// Find the bootc binary using multiple strategies
+    fn find_bootc_binary() -> Result<PathBuf> {
+        // Strategy 1: Try /proc/self/exe (works in most cases)
+        if let Ok(exe_path) = std::fs::read_link("/proc/self/exe") {


OK so this is trying to handle the original failure...but it's really unclear to me why it was failing and specifically how that test case run is different than the existing loopback tests we have.

(original issue in #1439 )

As far as I know /proc/self/exe should really always work; we're using it in reexec_with_guardenv.

I think what would be good here is that if read_link fails, let's show what value it points to.

Hmm random idea as to the root cause - what if this is happening just when we need to re-exec the current process twice? And inside the container env?

I'm starting to suspect it has to do with that...

Def could be. Adding some changes that handles the case where /proc/self/exe fails (which could happen during re-execution in container environments)

Ah yes it's this almost for sure

bootc/crates/lib/src/lsm.rs

Line 91 in f4b01ab

// OK now, we always copy our binary to a tempfile, set its security context

In some cases we're hitting that fallback, but not others.

So...this is getting a bit complicated. Your fallback code here will definitely work in general....but I have some ideas to do something more sophisticated.

The more complex fix I think is to make an new internal "out of process drop helper" crate that we initialize before we do any re-exec dances. I.e. we fork/re-exec that helper and talk to it over IPC or so. It would be more involved though.

cgwalters · 2025-07-24T20:47:17Z

crates/blockdev/src/blockdev.rs

    // Shared backend for our `close` and `drop` implementations.
    fn impl_close(&mut self) -> Result<()> {
        // SAFETY: This is the only place we take the option
        let Some(dev) = self.dev.take() else {
            tracing::trace!("loopback device already deallocated");
            return Ok(());
        };
+
+        // Kill the cleanup helper since we're cleaning up normally


Hmm. We could also send it SIGTERM and let it do the cleanup. Feels like it's perhaps better that way as then we're always exercising the cleanup via child process.

cgwalters · 2025-07-24T20:51:02Z

crates/blockdev/src/blockdev.rs

@@ -194,6 +194,10 @@ struct LoopbackCleanupHandle {
 impl LoopbackDevice {
    // Create a new loopback block device targeting the provided file path.
    pub fn new(path: &Path) -> Result<Self> {
+        let path = Utf8PathBuf::from_path_buf(path.to_path_buf())


If we wanted to impose this requirement it'd be cleaner to make the API just take a Utf8Path instead. But I don't think we need the requirement so I think we should drop this change.

cgwalters · 2025-07-24T20:51:25Z

crates/blockdev/src/blockdev.rs

            .args([
                "--show",
                format!("--direct-io={direct_io}").as_str(),
                "-P",
                "--find",
+                path.as_str(),


The utf8 thing probably came from this, but I don't think the change is necessary.

cgwalters · 2025-07-24T20:54:31Z

crates/blockdev/src/blockdev.rs

-        Ok(_) => {
-            tracing::error!("Failed to clean up loopback device {}", device_path);
+        Ok(output) => {
+            let stderr = String::from_utf8_lossy(&output.stderr);


This child process has its stdout/stderr going to /dev/null by default so all of this is just lost unfortunately.

- Add robust binary path resolution with multiple fallback strategies - Use /proc/self/exe, argv[0], common paths, and PATH search - Add graceful fallback when cleanup helper can't be spawned - Improve error handling and logging - Add comprehensive tests for binary finding logic This fixes the 'Failed to spawn loopback cleanup helper' error that was causing issues in packaged distributions where the binary path was not easily discoverable.

cgwalters

OK let's run with this, we can do the drop helper crate as a followup!

One thing that you could do though is (until #1359 lands) is build and push an image from this PR and then run the e2e test from https://gitlab.com/fedora/bootc/tests/bootc-workflow-test on it

gemini-code-assist bot reviewed Jul 24, 2025

View reviewed changes

crates/blockdev/src/blockdev.rs Outdated Show resolved Hide resolved

crates/blockdev/src/blockdev.rs Outdated Show resolved Hide resolved

gursewak1997 force-pushed the fix-loopback-cleanup-helper branch from 4571471 to 7074ca6 Compare July 24, 2025 20:36

gursewak1997 mentioned this pull request Jul 24, 2025

drop loopback out-of-process #799

Closed

cgwalters reviewed Jul 24, 2025

View reviewed changes

gursewak1997 force-pushed the fix-loopback-cleanup-helper branch from 7074ca6 to ff004c9 Compare July 25, 2025 17:37

gursewak1997 marked this pull request as ready for review July 28, 2025 19:06

cgwalters approved these changes Jul 28, 2025

View reviewed changes

cgwalters merged commit b803ea7 into bootc-dev:main Jul 28, 2025
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

blockdev: implement signal-safe loopback device cleanup helper #1449

blockdev: implement signal-safe loopback device cleanup helper #1449

Uh oh!

gursewak1997 commented Jul 24, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

cgwalters left a comment

Uh oh!

cgwalters Jul 24, 2025

Uh oh!

cgwalters Jul 24, 2025

Uh oh!

cgwalters Jul 24, 2025

Uh oh!

gursewak1997 Jul 25, 2025

Uh oh!

cgwalters Jul 25, 2025

Uh oh!

cgwalters Jul 25, 2025

Uh oh!

cgwalters Jul 24, 2025

Uh oh!

cgwalters Jul 24, 2025

Uh oh!

cgwalters Jul 24, 2025

Uh oh!

cgwalters Jul 24, 2025

Uh oh!

cgwalters left a comment

Uh oh!

Uh oh!

Uh oh!

blockdev: implement signal-safe loopback device cleanup helper #1449

blockdev: implement signal-safe loopback device cleanup helper #1449

Uh oh!

Conversation

gursewak1997 commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

cgwalters left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cgwalters left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gursewak1997 commented Jul 24, 2025 •

edited

Loading