Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import GRUB static migration code #790

Merged
merged 1 commit into from
Feb 3, 2025

Conversation

travier
Copy link
Member

@travier travier commented Dec 3, 2024

migrate-static-grub-config: Add GRUB static migration subcommand

Add a hidden subcommand that migrates existing systems using a dynamic
GRUB config to a static one.

This command is expected to be run after a successful bootloader update.
One way to do that is to add it as a droppin unit config for the
bootloader-update.service unit included in this repo:

$ cat /usr/lib/systemd/system/bootloader-update.service.d/migrate-static-grub-config.conf
[Service]
ExecStart=/usr/bin/bootupctl migrate-static-grub-config

This will be used on Atomic Desktops & IoT systems to migrate systems to
a static GRUB config before enabling composefs as GRUB curently does not
interact well with it [1].

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2308594

See: https://gitlab.com/fedora/ostree/sig/-/issues/35
See: https://pagure.io/workstation-ostree-config/pull-request/591
See: https://fedoraproject.org/wiki/Changes/ComposefsAtomicDesktops
Fixes: #789

Copy link

openshift-ci bot commented Dec 3, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

src/bootupd.rs Outdated
@@ -489,6 +496,88 @@ pub(crate) fn client_run_validate() -> Result<()> {
Ok(())
}

pub(crate) fn client_run_migrate() -> Result<()> {
// Used to condition execution of this unit at the systemd level
let stamp_file="/var/lib/.fedora_atomic_desktops_static_grub";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a cargo fmt.

Also... now that this is part of bootupd calling the stamp file name "fedora_atomic_desktops_" seems odd.

Bikeshedding things a bit more...bootupd already has its own little database where we could store this state.

I'm also fine just keeping it as a stamp file, but how about e.g. .bootupd-static-migration-complete? Also since this is about data in /boot I think we should probably keep the stamp file there?

Copy link
Member Author

@travier travier Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to have this represented in bootup state directly then we would have to add a new mode to distinguish between the bootupd managed static grub configs and the ones that we imported from a system that are still user managed because they may contain arbitrary changes, os-prober systems, etc. and we don't want bootupd to override those on static config updates (when we'll implement that).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...I more meant that we add a new key to the JSON file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the end, I have not done that yet as:

  • I would have to figure out how to do it and test it
  • Having a stamp file makes it easy to skip running this later during boot using systemd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not done that yet but I removed the stamp file and the logic now relies only on the ostree bootloader repo config option.

src/bootupd.rs Outdated Show resolved Hide resolved
@travier
Copy link
Member Author

travier commented Dec 4, 2024

I immediately ran into fedora-selinux/selinux-policy#2444 while testing this.

@travier travier force-pushed the main-static-migration branch 6 times, most recently from 0436363 to ed15ca7 Compare December 5, 2024 17:07
@HuijingHei
Copy link
Member

Overall LGTM, I can help to do some testing if you have some instructions.

@travier
Copy link
Member Author

travier commented Dec 12, 2024

How I'm (manually) testing this change:

  • Install Fedora Silverblue 40 in two VMs, one using BIOS & another UEFI (see https://gitlab.com/fedora/ostree/scripts for scripts that automate that), do the same for Fedora Silverblue 41
  • Update the Fedora 40 VMs to Fedora 41
  • Optional: Make a snapshot of the VM
  • Build bootupd and copy it to the VM:
$ cargo build --release
$ scp target/release/bootupd silverblue-41-uefi-bootupd:
[silverblue]$ sudo rpm-ostree usroverlay
[silverblue]$ sudo cp bootupd /usr/bin/bootupctl && sudo cp bootupd /usr/libexec/bootupd
  • Run the migration:
$ sudo bootupctl migrate
  • Validate the state of the system:

    • Things to verify, on a system installed from Fedora 41:
root@fedora:~# sudo ls -alh /boot/grub2
total 16K
drwx------. 2 root root 4.0K Oct 25 13:29 .
drwxr-xr-x. 7 root root 4.0K Nov 29 14:37 ..
-rw-r--r--. 1 root root   53 Oct 25 13:29 bootuuid.cfg
-rw-r--r--. 1 root root 2.6K Oct 25 13:29 grub.cfg

root@fedora:~# grep ostree /boot/grub2/grub.cfg
<empty>

root@fedora:~# grep blscfg /boot/grub2/grub.cfg
blscfg

root@fedora:~# sudo cat /sysroot/ostree/repo/config
[core]
repo_version=1
mode=bare

[sysroot]
readonly=true
bootloader=none
  • Things to verify, on a system installed from an older release:
root@fedora:~# ls -alh /boot/grub2
total 64K
drwx------. 5 root root 4.0K Dec  2 20:30 .
drwxr-xr-x. 6 root root 4.0K Dec  2 20:13 ..
-rw-r--r--. 1 root root   64 Oct 18 17:50 device.map
drwx------. 2 root root 4.0K Jan  1  1970 fonts
-rw-r--r--. 1 root root    0 Dec  2 20:30 .grub2-blscfg-supported
-rw-------. 1 root root 7.0K Dec  2 20:30 grub.cfg
-rw-------. 1 root root 8.8K Nov 29 10:15 grub.cfg.backup
-rw-------. 1 root root 1.0K Dec  2 15:22 grubenv
drwxr-xr-x. 2 root root  20K Dec  2 20:13 i386-pc
drwxr-xr-x. 2 root root 4.0K Dec  2 20:13 locale

root@fedora:~# ls -alh /boot/grub2/.grub2-blscfg-supported 
-rw-r--r--. 1 root root 0 Dec  2 20:30 /boot/grub2/.grub2-blscfg-supported

root@fedora:~# grep ostree /boot/grub2/grub.cfg
  set kernelopts="root=UUID=f59453a5-036d-4b9a-b1a2-4428bf5eaecf ro rootflags=subvol=root/ostree/deploy/fedora/deploy/a7eefc0751cf4688e6daf0d998d24a435ae4d44ab193d8dce7e01f66d366fa08.0 rhgb quiet "
### BEGIN /etc/grub.d/15_ostree ###
### END /etc/grub.d/15_ostree ###

root@fedora:~# grep blscfg /boot/grub2/grub.cfg
# The blscfg command parses the BootLoaderSpec files stored in /boot/loader/entries and
insmod blscfg
blscfg

root@fedora:~# cat /sysroot/ostree/repo/config
[core]
repo_version=1
mode=bare

[sysroot]
readonly=true
bootloader=none

@travier travier force-pushed the main-static-migration branch 2 times, most recently from ec3c25f to e17023d Compare December 12, 2024 18:26
@travier travier marked this pull request as ready for review December 12, 2024 18:34
@travier
Copy link
Member Author

travier commented Dec 12, 2024

So this works as far as I've tested but we have no tests for it yet.

@HuijingHei
Copy link
Member

Install Fedora Silverblue 40 in two VMs, one using BIOS & another UEFI (see https://gitlab.com/fedora/ostree/scripts for scripts that automate that)
Update the Fedora 40 VMs to Fedora 41

Do testing on BIOS VM and UEFI VM, then upgrade to f41, copy the new bootupd to the vm, run $ sudo bootupctl migrate, check the results are expected as above, also remove ostree-grub2 (seems it is removed in https://pagure.io/workstation-ostree-config/pull-request/591), and reboot successfully.
Let me know if I missed something, thanks!

@travier travier force-pushed the main-static-migration branch from e17023d to 4a65f3b Compare December 18, 2024 11:42
@travier
Copy link
Member Author

travier commented Dec 18, 2024

I've added the systemd unit to this PR and rebased it on top of #803 to avoid conflicts for the systemd unit setup in the Makefile & specfile.

I've split the clippy lint fixes in #804.

@travier travier changed the title WIP: Import GRUB static migration code Import GRUB static migration code Dec 18, 2024
contrib/packaging/bootupd-static-grub-migration.service Outdated Show resolved Hide resolved
src/bootupd.rs Outdated Show resolved Hide resolved
src/bootupd.rs Outdated Show resolved Hide resolved
src/bootupd.rs Outdated Show resolved Hide resolved
src/bootupd.rs Outdated Show resolved Hide resolved
@travier travier force-pushed the main-static-migration branch from 4a65f3b to 06132d0 Compare December 18, 2024 15:38
@HuijingHei
Copy link
Member

Ask a silly question, is there any reason should do the migration only when /boot/grub2/grub.cfg is symlink?

Check on silverblue40 (using efi), it is symlink to /boot/loader/grub.cfg, but on silverblue41, it is a normal file.

@travier
Copy link
Member Author

travier commented Dec 19, 2024

Ask a silly question, is there any reason should do the migration only when /boot/grub2/grub.cfg is symlink?

As far I know, this is the marker that lets us tell a system with a dynamic config from one with a static one.

Check on silverblue40 (using efi), it is symlink to /boot/loader/grub.cfg, but on silverblue41, it is a normal file.

On Silverblue 40, GRUB is setup using a dynamic config like package mode Fedora and on Silverblue 41, it is setup by bootupd with a static config.

src/cli/bootupctl.rs Outdated Show resolved Hide resolved
src/bootupd.rs Outdated
.context("Failed to exchange symlink with current GRUB config")?;

// Remove the now unused symlink (optional cleanup, ignore any failures)
_ = dirfd.remove_file("grub.cfg.current");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also delete the target of the symlink (/boot/loader/grub.cfg) ?
It'll be delete on the next ostree deploy, but it's cleaner IMO.

Or we could just mv $(readlink -ne /boot/grub2/grub.cfg) /boot/grub2/grub.cfg

I would also add a sync call after the mv

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also delete the target of the symlink (/boot/loader/grub.cfg) ?
It'll be delete on the next ostree deploy, but it's cleaner IMO.

We could, but it's harmless and as you said it will be "removed" (not generated) for the next deployment. Keeping it makes things easier for debugging for now.

Or we could just mv $(readlink -ne /boot/grub2/grub.cfg) /boot/grub2/grub.cfg

AFAIK that would not be atomic.

I would also add a sync call after the mv

We should indeed do that. I'll add this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK that would not be atomic.

both files are under a single fs /boot so I don't see why it would not be atomic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though I had read somewhere that only renames of files in the same directory would be atomic but looks like I misread of misunderstood something as I can not find that in: https://manpages.debian.org/testing/manpages-dev/renameat2.2.en.html

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ERRORS section:

EXDEV
oldpath and newpath are not on the same mounted filesystem. (Linux permits a filesystem to be mounted at multiple points, but rename() does not work across different mount points, even if the same filesystem is mounted on both.)

src/bootupd.rs Outdated Show resolved Hide resolved
@travier
Copy link
Member Author

travier commented Jan 10, 2025

Also grub2-mkconfig will actually still use grub2-probe (os-prober) and fail with composefs enabled, so we can't update + migrate + enable composefs in a single update (my use case is offline appliances, possibly skipping updates).

Indeed, this is why I'm looking at pushing that to Fedora 41 as well, before the F42 release. I'll have to investigate that more.

Thinking about this more, this means that we can not take the approach as written in this current PR as the grub configuration generation would fail as part of the generation on the first boot on F42 with composefs enabled.

We have multiple options:

  • Do a plain copy/paste of the config instead and strip the ostree boot entries "manually" by removing all the content between the #### ...ostree.. GRUB marker in the config. This would let us keep composefs enabled directly on the first boot in F42 but would mean that any migration failure will likely block updates.
  • Use the ostree.prepare-root.composefs=0 kernel argument instead and a migration process similar to the one we did with the sysroot RO one to only enable composefs once the migration has been safely completed. On the first boot in F42, we would do the migration and update the kernel argument only when ready. This would mean that we can not remove the ostree-grub2 package in F42 and we would have to also wait for F43 to statically enable composefs.

@travier travier force-pushed the main-static-migration branch from 06132d0 to 460bf23 Compare January 23, 2025 17:41
@travier
Copy link
Member Author

travier commented Jan 23, 2025

I reworked this PR to do both the static GRUB config migration and enable composefs via the karg.

Instead of calling grub2-mkconfig, I now manually strip all the content between the ### BEGIN /etc/grub.d/15_ostree ### and ### END /etc/grub.d/15_ostree ### lines. This means that this should be safe to run on a system where composefs is already enabled.

I manually tested this change using the same instructions as #790 (comment).

The only difference is the new name of the hidden argument:

silverblue@fedora:~$ sudo bootupctl migrate-static-grub-config-composefs
Running as unit: bootupd.service
ostree repo 'sysroot.bootloader' config option not set yet
Marking bootloader as BLS capable...
Migrating to a static GRUB config...
Creating a backup of the current GRUB config '/boot/grub2/../loader/grub.cfg' in '/boot/grub2/grub.cfg.backup'...
Stripping ostree generated entries from GRUB config...
GRUB config symlink successfully replaced with the current config
Setting up 'sysroot.bootloader' to 'none' in ostree repo config...
Static GRUB config migration completed successfully
Editing kernel command line arguments to enable composefs...
Composefs migration completed successfully

This would be enabled on IoT & Atomic Desktops by adding a systemd drop-in config with:

[Service]
ExecStart=bootupctl migrate-static-grub-config-composefs

to the bootloader-update.service from bootupd, which should order the migration after the bootloader update.

When running on a Fedora 41 born system:

silverblue@fedora:~$ sudo bootupctl migrate-static-grub-config-composefs
Running as unit: bootupd.service
Already using a static GRUB config
Editing kernel command line arguments to enable composefs...
Composefs migration completed successfully

@travier
Copy link
Member Author

travier commented Jan 23, 2025

Hum, unfortunately I think I've found another case where that will not work:

  • F41 installed system
    • never updated
    • thus older ostree, not generating the composefs blob by default yet
  • Rebase to F42
    • composefs is not enabled in the image thus the composefs blob is not generated
    • bootupd enables composefs via karg
  • System is rebooted
    • system fails to boot as composefs blob does not exists for the current deployment

So this means that we need to enable day 1 composefs in the image for F42 to make sure that ostree from F41 generates the composefs blob and hope that the migration works :/

src/bootupd.rs Outdated Show resolved Hide resolved
src/bootupd.rs Outdated Show resolved Hide resolved
src/bootupd.rs Outdated Show resolved Hide resolved
src/bootupd.rs Show resolved Hide resolved
src/bootupd.rs Outdated Show resolved Hide resolved
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking requested-changes per discussion

@travier
Copy link
Member Author

travier commented Jan 24, 2025

This will break people that want to use 'signed' or 'verity' no ?

At a quick glance at the code, I think it will disable signatures for the signed case, but leave the verity case unchanged.

If we want to avoid that then this means that we can not rely on the kernel argument for the transition.

If I understand correctly ostreedev/ostree#3353, it means that before this patch, the composefs blob was only generated if the destination system had it explicitly enabled.

This means that a freshly installed Fedora 41, not updated, which includes such an older ostree version, will only generate the composefs metadata if it's enabled in the Fedora 42 image.

I don't know how it would react to a config in the F42 image set to maybe (error or ignore? will need to test).

If older ostree ignores maybe and does nothing, then we need to make sure that the system can not update again after the rebase to F42 until the static GRUB config migration has been completed successfully. Otherwise the next deployment will have the composefs blob generated and then next boot composefs enabled and then updates will fail if we are not using a static GRUB config yet. That would mean blocking all rpm-ostree operations on the migration succeeding.

If older ostree errors out on maybe, then we are forced to enable compose (yes) in F42 and similarly we will have to block any rpm-ostree operation on the static migration having successfully completed as the system will be booted with composefs directly.

Overall, the safest path seems to be setting composefs to enabled in F42 and directly blocking rpm-ostree operations on the migration being successful, via an ExecStartPre=bootupctl migrate from a droppin config file. This seems the less likely to fail in a weird way path.

@travier travier mentioned this pull request Jan 27, 2025
4 tasks
@travier travier force-pushed the main-static-migration branch from 460bf23 to 081cf03 Compare January 27, 2025 17:54
Add a hidden subcommand that migrates existing systems using a dynamic
GRUB config to a static one.

This command is expected to be run after a successful bootloader update.
One way to do that is to add it as a droppin unit config for the
`bootloader-update.service` unit included in this repo:

```
$ cat /usr/lib/systremd/system/bootloader-update.service.d/migrate-static-grub-config.conf
[Service]
ExecStart=/usr/bin/bootupctl migrate-static-grub-config
```

This will be used on Atomic Desktops & IoT systems to migrate systems to
a static GRUB config before enabling composefs as GRUB curently does not
interact well with it [1].

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2308594

See: https://gitlab.com/fedora/ostree/sig/-/issues/35
See: https://pagure.io/workstation-ostree-config/pull-request/591
See: https://fedoraproject.org/wiki/Changes/ComposefsAtomicDesktops
Fixes: coreos#789
@travier travier force-pushed the main-static-migration branch from 081cf03 to c4eaadb Compare January 27, 2025 18:00
@travier
Copy link
Member Author

travier commented Jan 27, 2025

Alright, thanks for the reviews, I've updated that again to remove the composefs part and keep this only about the static GRUB config migration following my investigations detailed above.

Copy link

@champtar champtar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment but not a blocker

src/bootupd.rs Show resolved Hide resolved
@travier travier requested a review from cgwalters February 3, 2025 15:11
@travier
Copy link
Member Author

travier commented Feb 3, 2025

FYI, the new command is:

$ bootupctl migrate-static-grub-config

as it's only doing the static GRUB config change and no longer anything regarding composefs.

@travier
Copy link
Member Author

travier commented Feb 3, 2025

I've tested all the scenarios I could think of with that code and things appears to be working as expected. I would appreciate if we could get this in before the F42 beta freeze.

@cgwalters cgwalters merged commit 0ee8287 into coreos:main Feb 3, 2025
12 checks passed
@travier travier deleted the main-static-migration branch February 4, 2025 10:35
@travier travier mentioned this pull request Feb 5, 2025
travier added a commit to travier/fedora-atomic-desktops-devel that referenced this pull request Feb 5, 2025
travier added a commit to travier/fedora-atomic-desktops-devel that referenced this pull request Feb 6, 2025
pierrepinon pushed a commit to pierrepinon/workstation-ostree-config that referenced this pull request Feb 14, 2025
pierrepinon pushed a commit to pierrepinon/workstation-ostree-config that referenced this pull request Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add GRUB static config migration as sub command
4 participants