Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for --replace-mode=alongside for ostree target #137

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cgwalters
Copy link
Collaborator

Ironically our support for --replace-mode=alongside breaks when we're targeting an already extant ostree host, because when we first blow away the /boot directory, this means the ostree stack loses its knowledge that we're in a booted deployment, and will attempt to GC it...

ostreedev/ostree-rs-ext@8fa019b is a key part of the fix for that.

However, a notable improvement we can do here is to grow this whole thing into a real "factory reset" mode, and this will be a compelling answer to
coreos/fedora-coreos-tracker#399

To implement this though we need to support configuring the stateroot and not just hardcode default.

@openshift-ci
Copy link

openshift-ci bot commented Oct 2, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@omertuc
Copy link
Contributor

omertuc commented Sep 18, 2024

Sorry @cgwalters , accidentally pushed the rebase to your fork instead of mine

EDIT: undid it, continuing my rebase efforts on https://github.com/omertuc/bootc/tree/137clone

EDIT2: Continuing here

@cgwalters
Copy link
Collaborator Author

I think you can just take over this PR too if you want, or open a new PR from your fork - either way.

@omertuc
Copy link
Contributor

omertuc commented Sep 19, 2024

Rebased. Without any changes, I'm facing an issue where in an ostree system, the mounted / on the host system is an overlay (-v mounted into /:/target) and so the findmnt source for it is overlay rather than a /dev/... and so it trips up lsblk later on

I'll see how I can tweak it so that it finds the right device

@omertuc
Copy link
Contributor

omertuc commented Oct 2, 2024

With additional -v /sysroot:/target -v /sysroot:/target/sysroot mounts instead of -v /:/target and --stateroot foo, this seems to work

@omertuc
Copy link
Contributor

omertuc commented Oct 2, 2024

@cgwalters thoughts on the above mounts? Do we want to require them for install on ostree targets, or should I figure out a way to make this work without them, using just the already-documented install mounts (i.e. /:/target)?

@cgwalters
Copy link
Collaborator Author

and so the findmnt source for it is overlay rather than a /dev/... and so it trips up lsblk later on

We should learn how to peel that. This is really the same thing as https://bugzilla.redhat.com/show_bug.cgi?id=2308594 and ostreedev/ostree#3198 and containers/composefs#280

Short term the simplest is the same logic as the grub patch - detect overlayfs for / and check if /sysroot exists and is mounted, if so use that.

@omertuc
Copy link
Contributor

omertuc commented Oct 8, 2024

and so the findmnt source for it is overlay rather than a /dev/... and so it trips up lsblk later on

We should learn how to peel that. This is really the same thing as bugzilla.redhat.com/show_bug.cgi?id=2308594 and ostreedev/ostree#3198 and containers/composefs#280

Short term the simplest is the same logic as the grub patch - detect overlayfs for / and check if /sysroot exists and is mounted, if so use that.

OK. Changed it so that when the target rootfs is an overlay, we'll implicitly try targetting <original_target>/sysroot instead.

It wasn't working at first and was a bit of a headache for me to debug because apparently if you mount /:/target then inside the container /target/sysroot is read-only by default, and so ensure_dir_labeled was failing, as opposed to when you mount /sysroot:/target directly, in which case it's not read-only. Took me a while to track that down chasing red herrings, and I'm still not sure who's responsible for this behavior (kernel? podman?), but after I realized it I simply moved ensure_dir_label to run only after your added let _ = crate::utils::open_dir_remount_rw... and then the rest just worked.

Current code might need a bit of touch-ups, but do you think the direction of the code in its current state is good? Should I clean it up and undraft?

@cgwalters
Copy link
Collaborator Author

It wasn't working at first and was a bit of a headache for me to debug because apparently if you mount /:/target then inside the container /target/sysroot is read-only by default, and so ensure_dir_labeled was failing, as opposed to when you mount /sysroot:/target directly, in which case it's not read-only.

I think that's possibly because it's bootc that's special casing mounting /sysroot read-write - that's how we do it outside of a container at least.

Copy link
Collaborator Author

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up!!

lib/src/install.rs Show resolved Hide resolved
lib/src/install.rs Outdated Show resolved Hide resolved
lib/src/lsm.rs Outdated Show resolved Hide resolved
lib/src/utils.rs Outdated Show resolved Hide resolved
@omertuc omertuc force-pushed the install-existing-ostree branch 4 times, most recently from 3561a64 to ed94f1e Compare October 14, 2024 17:35
@omertuc omertuc force-pushed the install-existing-ostree branch 3 times, most recently from d392280 to eed91ff Compare October 21, 2024 15:05
@cgwalters
Copy link
Collaborator Author

We talked about this and realized that while the new test passes, it would pass already today because it's actually the "install --stateroot" that makes it work.

The main fix we need here is preserving existing deployments when we detect we're booted via ostree.

So actually a way we could test this is via our tmt tests instead.

But in the end again I'm good to merge as is. Since I wrote this PR I can't approve it, you (or someone else) needs to do so.

@omertuc
Copy link
Contributor

omertuc commented Oct 25, 2024

So actually a way we could test this is via our tmt tests instead.

But in the end again I'm good to merge as is.

Branching this to a separate issue #847

@omertuc
Copy link
Contributor

omertuc commented Oct 25, 2024

Up until now I've tested the PR on a random RHEL AI (bootc) system I had lying around, and it worked, but now I'm trying it on Silverblue and it fails. This seems to be due to the fact that / (/target inside the bootc container) on silverblue, despite being a btrfs rw mount, cannot be written to, so our rw remount is futile

Particularly the action that fails is fchmod but it's only because it's the first write operation we perform on the target

@cgwalters
Copy link
Collaborator Author

silverblue, despite being a btrfs rw mount, cannot be written to

What error do we get? -EROFS? What does findmnt look like when our container is executed? I wonder if we're messing up a remount?

@omertuc
Copy link
Contributor

omertuc commented Oct 25, 2024

silverblue, despite being a btrfs rw mount, cannot be written to

What error do we get? -EROFS? What does findmnt look like when our container is executed? I wonder if we're messing up a remount?

Reusing extant ostree layout
DEBUG Loaded SELinux policy: b06a119416deb696a80dbb67248b66d3829780c9e72d376a5089097ace6de5f8
DEBUG Target . is a mountpoint, remounting rw
DEBUG Labeling .
TRACE Label for . is Some("system_u:object_r:root_t:s0")
ERROR Installing to filesystem: Creating ostree deployment: fchmod: Operation not permitted (os error 1)

findmnt inside the container:

bash-5.1# findmnt -J -v --output=SOURCE,TARGET,MAJ:MIN,FSTYPE,OPTIONS,UUID --mountpoint /target
{
   "filesystems": [
      {
         "source": "/dev/vda3",
         "target": "/target",
         "maj:min": "0:32",
         "fstype": "btrfs",
         "options": "rw,relatime,seclabel,compress=zstd:1,discard=async,space_cache=v2,subvolid=258,subvol=/root",
         "uuid": "41f3f4ae-1242-4ef1-bcd8-fd8706f51738"
      }
   ]
}

@cgwalters
Copy link
Collaborator Author

ERROR Installing to filesystem: Creating ostree deployment: fchmod: Operation not permitted (os error 1)

Is there maybe a selinux denial here? Failing on fchmod is really odd.

Also hmm we must be missing some .context() here...are we failing on ensure_dir_labeled()? I bet that's it...but it'd be good to know for sure

@cgwalters
Copy link
Collaborator Author

@omertuc I think we need to get more of your PRs merged, you're doing good work! This one specifically I guess I'm uncertain if we should block on the silverblue-reinstall case; I suspect it's btrfs specific.

But I may look at it today...

@omertuc
Copy link
Contributor

omertuc commented Nov 5, 2024

@omertuc I think we need to get more of your PRs merged, you're doing good work! This one specifically I guess I'm uncertain if we should block on the silverblue-reinstall case; I suspect it's btrfs specific.

But I may look at it today...

Yeah I was downloading Fedora CoreOS (as opposed to Silverblue) today to see if we hit similar issues, I'll report with what I find

cgwalters added a commit to cgwalters/bootc that referenced this pull request Nov 5, 2024
@cgwalters
Copy link
Collaborator Author

OK yeah there's been a ton of changes in the main branch since this PR was last rebased, it's a conflict-fest. I took an attempt at it and pushed my update to https://github.com/cgwalters/bootc/tree/install-existing-ostree2

@cgwalters
Copy link
Collaborator Author

One big conflict was around how the install phase got split up, and threading through has_ostree between those parts would have been annoying.

I did #872 instead which is just dead code to start, but moves the "we detected an extant repo" into the "big bag of state" we already had in RootSetup.

@omertuc
Copy link
Contributor

omertuc commented Nov 6, 2024

stop stop stop 😆 you're working on your old stale branch, I've already solved all these conflicts a while back

@omertuc
Copy link
Contributor

omertuc commented Nov 6, 2024

Forced pushed now to solve some new tiny conflict on import lines, other than that there's no conflicts vs main

@omertuc
Copy link
Contributor

omertuc commented Nov 6, 2024

Forced push again because of duplicate import

@omertuc
Copy link
Contributor

omertuc commented Nov 6, 2024

I suspect it's btrfs specific.

Yeah I was downloading Fedora CoreOS (as opposed to Silverblue) today to see if we hit similar issues, I'll report with what I find

I just tried Fedora CoreOS and I can confirm the same issue happens even with xfs, so it's not a btrfs issue

@cgwalters
Copy link
Collaborator Author

stop stop stop 😆 you're working on your old stale branch, I've already solved all these conflicts a while back

Oops! Sorry

@omertuc
Copy link
Contributor

omertuc commented Nov 12, 2024

Got around to taking a serious look at this read-only behavior on FCOS/Silverblue, it seems to simply be due to an immutable attribute on the ostree deployment, which gets preserved when the ostree deployment directory is simply mounted directly only /.

On bootc systems I assume this attribute gets lost due to usage of composefs/overlay instead of a direct mount

@cgwalters
Copy link
Collaborator Author

Oh duh of course.

On bootc systems I assume this attribute gets lost due to usage of composefs/overlay instead of a direct mount

Kind of, it's not lost so much as shadowed. But basically with composefs we don't need the immutable bit anymore, it was always a hack.

Although wait, there's two immutable bits potentially - one on the deployment root, and one on the physical /.

Ahh yes, see https://github.com/coreos/coreos-assembler/blob/58df48e0c237a638df5b57a475d3b80b1029baa5/src/osbuild-manifests/coreos.osbuild.x86_64.mpp.yaml#L560

Basically let's change our code to run chattr -i before the chmod. We could try to preserve it but I don't think it's important.

@omertuc
Copy link
Contributor

omertuc commented Nov 12, 2024

Basically let's change our code to run chattr -i before the chmod. We could try to preserve it but I don't think it's important.

Yep, force pushed with rebase and this change. However we're now hitting another problem:

TRACE exec: "ostree" "config" "--repo" "ostree/repo" "set" "sysroot.bootloader" "none"
ERROR Installing to filesystem: Creating ostree deployment: Subprocess failed: ExitStatus(unix_wait_status(256))

Looking into what's behind this one

Although wait, there's two immutable bits potentially - one on the deployment root, and one on the physical /.

Is that right? They seem to be tied from what I can see:

$ findmnt 
/                                            /dev/vda4[/ostree/deploy/fedora-coreos/deploy/6df70065620571076f242857b9080d747891e2279dff3ed1756270f6889731ce.0]     xfs        rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota

Removing the bit from one removes it from the other

@cgwalters
Copy link
Collaborator Author

They seem to be tied from what I can see:

This is confusing for sure! In ostree there's the term "physical root" vs "booted root". When we're in the booted system, then / is the deployment/booted root, and /sysroot is the physical root.

But mounting the filesystem from outside, / is the physical root, and /ostree/deploy/<stateroot>/<checksum>.<id> is the deployment root.

I am pretty sure what we're hitting here at first is the physical root, not the deployment root.

ERROR Installing to filesystem: Creating ostree deployment: Subprocess failed: ExitStatus(unix_wait_status(256))

Well that's busted, we should be getting stderr from that...is there nothing there?

@omertuc
Copy link
Contributor

omertuc commented Nov 12, 2024

strace -y

[pid  4103] openat(3</target/sysroot/ostree/repo>, ".", O_WRONLY|O_CLOEXEC|O_TMPFILE, 0600) = -1 EROFS (Read-only file system)

Looks like inside the container, /target and /target/sysroot are separate mounts that have to be rw remounted separately... Another thing that didn't happen in existing bootc systems

Looking into this

@omertuc
Copy link
Contributor

omertuc commented Nov 12, 2024

When remounting both, installation seems to proceed smoothly, so this seems to be the final hurdle

Ironically our support for `--replace-mode=alongside` breaks
when we're targeting an already extant ostree host, because when
we first blow away the `/boot` directory, this means the ostree
stack loses its knowledge that we're in a booted deployment,
and will attempt to GC it...

ostreedev/ostree-rs-ext@8fa019b
is a key part of the fix for that.

However, a notable improvement we can do here is to grow this
whole thing into a real "factory reset" mode, and this will
be a compelling answer to
coreos/fedora-coreos-tracker#399

To implement this though we need to support configuring the
stateroot and not just hardcode `default`.

Signed-off-by: Omer Tuchfeld <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/install Issues related to `bootc install`
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants