Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/systemd-boot: Add mirroredBoots #246897

Closed
wants to merge 2 commits into from
Closed

Conversation

Gerg-L
Copy link
Contributor

@Gerg-L Gerg-L commented Aug 3, 2023

Description of changes

Closes #152155

adds boot.loader.systemd-boot.mirroredBoots

only renamed efiSysMountPoint to mountPoint in the python script

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@Gerg-L Gerg-L requested a review from dasJ as a code owner August 3, 2023 03:19
@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Aug 3, 2023
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Aug 3, 2023
@0x4A6F
Copy link
Member

0x4A6F commented Aug 3, 2023

Only tested with systemd-boot and booted from both disks:

   # Use the systemd-boot EFI boot loader.
   boot.loader.systemd-boot.enable = true;
   boot.loader.efi.canTouchEfiVariables = true;

  # https://github.com/NixOS/nixpkgs/pull/246897
  boot.loader.systemd-boot.mirroredBoots = [
    "/boot0"
    "/boot4"
  ];

Any plans to incorporate this into nixosTests?

And what is the reason to rename efiSysMountPoint?

@RaitoBezarius
Copy link
Member

I can really accept this only with a nixosTest, the code looks pretty much good though.

@Gerg-L
Copy link
Contributor Author

Gerg-L commented Aug 3, 2023

Only tested with systemd-boot and booted from both disks:

   # Use the systemd-boot EFI boot loader.
   boot.loader.systemd-boot.enable = true;
   boot.loader.efi.canTouchEfiVariables = true;

  # https://github.com/NixOS/nixpkgs/pull/246897
  boot.loader.systemd-boot.mirroredBoots = [
    "/boot0"
    "/boot4"
  ];

Any plans to incorporate this into nixosTests?

And what is the reason to rename efiSysMountPoint?

Renamed efiSysMountPoint because the name was no longer accurate to boot.loader.efi.efiSysMountPoint

@Gerg-L
Copy link
Contributor Author

Gerg-L commented Aug 3, 2023

I can really accept this only with a nixosTest, the code looks pretty much good though.

i'll try to get to it tonight

@Gerg-L
Copy link
Contributor Author

Gerg-L commented Aug 5, 2023

I can really accept this only with a nixosTest, the code looks pretty much good though.

yeah I haven't been able to figure out how to make a nixosTest with a custom partition scheme setup before the bootloader is installed

@Gerg-L
Copy link
Contributor Author

Gerg-L commented Aug 6, 2023

@RaitoBezarius can you take a look at the second commit I just pushed, I feel like I'm so close to it working but I can't figure out exactly what I'm doing wrong

@nikstur
Copy link
Contributor

nikstur commented Aug 6, 2023

This reminds me of #226692 with a slightly different scope.

However I'm not sure I love the idea of mirrored boot. It seems like there is a potential for side effects by calling bootctl install multiple times as it would manipulate efivars each time.

Also see this comment about the dangers of doing so: #152155 (comment)

@Gerg-L
Copy link
Contributor Author

Gerg-L commented Aug 6, 2023

Also see this comment about the dangers of doing so: #152155 (comment)

That's talking about doing mdadm raid1 on your ESP...
This is simply installing the boot loader to multiple partitions

However I'm not sure I love the idea of mirrored boot. It seems like there is a potential for side effects by calling bootctl install multiple times as it would manipulate efivars each time.

I mean there's also boot.loader.grub.mirroredBoots

@nikstur
Copy link
Contributor

nikstur commented Aug 6, 2023

The linked thread is still insightful:

But just doing mirroring from linux side is not sufficient. The firmware/boot loader can write to the ESP too, and since they don't do RAID things will fall apart badly. And yes, sd-boot writes to the ESP if boot counting/boot assessment is enabled.

On top of that, the ESP is shared between OSes, so as soon as you do multi boot things fall apart even worse.

systemd/systemd#12468 (comment)

mirroredBoots would suffer from the same issue even if its not RAID.

This also means that just by virtue of implementing this bespoke feature we will make our lives harder keeping up with systemd updates since they have already signalled that mirroring the ESP is not something they even consider.

Couldn't you implement this more cleanly by writing a systemd service that checks for file changes on the ESP and then copies all files to the configured locations? This would also take care of multi boot, boot counting etc.

I mean there's also boot.loader.grub.mirroredBoots

I believe grub.mirroredBoot came from the MBR era where this somewhat made sense but they just adapted it to UEFI. Anyways, we shouldn't let what we have done with Grub lead our design. This will not lead to a good place.

@Gerg-L
Copy link
Contributor Author

Gerg-L commented Aug 6, 2023

I completely agree with not following grub

A systemd service should work, but that kinda goes against the use case of mirroring the ESP: Booting with complete redundancy
Then you have a "master" esp and "slave" esp(s)

@bjornfor
Copy link
Contributor

bjornfor commented Aug 6, 2023

systemd/systemd#12468 (comment)

mirroredBoots would suffer from the same issue even if its not RAID.

Why? AFAIU, mirroredBoots suffer from none of these issues.

But I would double check how bootctl install works wrt. the EFI variables, like mentioned before in this thread.

Couldn't you implement this more cleanly by writing a systemd service that checks for file changes on the ESP and then copies all files to the configured locations? This would also take care of multi boot, boot counting etc.

Why would that be any better? That seems just like a more brittle way to implement mirroredBoots. Why do you think boot counting breaks with mirroredBoots?

@nikstur
Copy link
Contributor

nikstur commented Aug 6, 2023

Why do you think boot counting breaks with mirroredBoots?

The issue is that if the bootloader or another OS (when you dual boot) modifies files on the ESP, these modifications are not accounted for when you mirror the ESP via the mechanism from this PR. If you just rsync the entire ESP to a different location, you also copy these modifications.

I'd also wager that rsyncing your ESP is actually less brittle because it just copies things instead of calling a stateful installation procedure repeatedly.

Booting with complete redundancy

As fas as I can tell, your implementation doesn't guarantee that all ESPs are written in a single transaction (i.e. when writing to one fails, others might still be written to). So then that's not better than having a service that (within a few seconds of writing to the main ESP) copies the main ESP to the backups. I'd argue both ways have the same (weak) guarantees on redundancy.

@Gerg-L
Copy link
Contributor Author

Gerg-L commented Aug 6, 2023

The issue is that if the bootloader or another OS (when you dual boot) modifies files on the ESP, these modifications are not accounted for when you mirror the ESP via the mechanism from this PR. If you just rsync the entire ESP to a different location, you also copy these modifications.

not if the modifications are done to the slave...

@bjornfor
Copy link
Contributor

bjornfor commented Aug 9, 2023

The issue is that if the bootloader or another OS (when you dual boot) modifies files on the ESP, these modifications are not accounted for when you mirror the ESP via the mechanism from this PR. If you just rsync the entire ESP to a different location, you also copy these modifications.

I'm thinking the exact opposite.

I think syncing the boot counting or other state across multiple ESPs somewhat defeats the purpose -- if all boot attempts are are spent on device A, you want firmware to try device B next, but if both devices have the same state now (thanks to background rsync) I think firmware will not attempt device B because it too seems to have spent all boot attempts.

With a systemd service / rsync I'd also expect difficulties with bidirectional syncing (already mentioned in this thread) and that the time window where the mirrored devices are out of sync increases.

My mental model of the ESP is it only changes when NixOS boot configuration changes, and firmware can have a bit of state on the side (not affecting NixOS). That means (1) we only need to write to the ESP when NixOS changes (no background sync needed), and (2) we shouldn't mess with the firmware state (backgrouns sync harmful). I think this is what this PR does.

@xaverdh
Copy link
Contributor

xaverdh commented Aug 9, 2023

The issue is that if the bootloader or another OS (when you dual boot) modifies files on the ESP, these modifications are not accounted for when you mirror the ESP via the mechanism from this PR. If you just rsync the entire ESP to a different location, you also copy these modifications.

Yes and this is precisely what you do not want. With this pr, the information gets duplicated from a source that is known to be intact. When you rsync the existing boot partition, any error present there (e.g. due to corruption of the disk) will propagate to the second copy.

@Gerg-L
Copy link
Contributor Author

Gerg-L commented Aug 9, 2023

We could always go with the nuclear option...

Wipe each ESP every time

Edit: this is a joke... Mostly

@0x4A6F
Copy link
Member

0x4A6F commented Aug 9, 2023

Getting following with a deploy-rs run on activation:

⭐ ℹ️ [activate] [INFO] Activating profile
activating the configuration...
setting up /etc...
reloading user units for root...
setting up tmpfiles
sed: can't read /boot/loader/loader.conf: No such file or directory
⭐ ❌ [activate] [ERROR] The activation script resulted in a bad exit code: Some(2)
Connection to 192.168.1.42 closed.
🚀 ❌ [deploy] [ERROR] Activating over SSH resulted in a bad exit code: Some(1)
🚀 ℹ️ [deploy] [INFO] Revoking previous deploys
🚀 ❌ [deploy] [ERROR] Deployment failed, rolled back to previous generation

@ElvishJerricco
Copy link
Contributor

We could always go with the nuclear option...

Wipe each ESP every time

Edit: this is a joke... Mostly

You joke but... #226168

@ElvishJerricco
Copy link
Contributor

Another thing to note is the "random seed": https://systemd.io/RANDOM_SEEDS/

The systemd-boot EFI boot loader included in systemd is able to maintain and provide a random seed stored in the EFI System Partition (ESP) to the booted OS, which allows booting up with a fully initialized entropy pool from earliest boot on. During installation of the boot loader (or when invoking bootctl random-seed) a seed file with an initial seed is placed in a file /loader/random-seed in the ESP. In addition, an identically sized randomized EFI variable called the ‘system token’ is set, which is written to the machine’s firmware NVRAM. During boot, when systemd-boot finds both the random seed file and the system token they are combined and hashed with SHA256 (in counter mode, to generate sufficient data), to generate a new random seed file to store in the ESP as well as a random seed to pass to the OS kernel. The new random seed file for the ESP is then written to the ESP, ensuring this is completed before the OS is invoked.

(Emphasis mine)

This is actually security critical. sd-boot will only update the random seed on the ESP that it's booting. If we then boot a different ESP that hasn't had its random seed updated, then we're reusing an old seed on a new boot, which is very bad practice.

@Gerg-L
Copy link
Contributor Author

Gerg-L commented Aug 10, 2023

Isn't that only if you rsync though?
If you're installing normally to two ESP's that shouldn't be a problem?

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/zfs-systemd-boot/29956/11

Comment on lines +68 to +71
installDirs =
if cfg.mirroredBoots != []
then cfg.mirroredBoots
else [efi.efiSysMountPoint];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was that mirroredBoots should be additional mountpoints, but seeing this I guess the implementation is only these mountpoints (the "main" ESP mountpoint gets ignored)? Am I the only one that find that surprising?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, and AFAIU the grub.nix case, mirroredBoots are additionals mountpoints there:

boot.loader.grub.mirroredBoots = optionals (cfg.devices != [ ]) [
{ path = "/boot"; inherit (cfg) devices; inherit (efi) efiSysMountPoint; }
];

else [efi.efiSysMountPoint];
in
pkgs.writeShellScript "install-systemd-boot.sh"
(lib.concatMapStrings (x: "${checkedSystemdBootBuilder x} \"$@\"\n") installDirs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that, like in grub.nix, the usual error detection set -e should be activated, and maybe also set -u.

Why keeping on passing $@? I don't see any argument neither used, nor passed (so far).

Nitpick: here each mountpoint generates a new derivation, the mountpoint could instead be passed as an envvar mountpoint=${escapeShellArg x} ${checkedSystemdBootBuilder}. Again that's likely just a few derivations in practice, so just a nitpick.

@@ -238,6 +245,17 @@ in {
'';
};

mirroredBoots = lib.mkOption {
type = lib.types.listOf lib.types.str;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grub.nix's mirroredBoots' type is more sophisticated, maybe it would be more correct to use the same or a subset.

@Gerg-L
Copy link
Contributor Author

Gerg-L commented Dec 10, 2023

nikstur is very clearly against this, so I'm just going to close this and keep using grub for now

@Gerg-L Gerg-L closed this Dec 10, 2023
@Gerg-L Gerg-L deleted the systemd-boot branch March 2, 2024 01:05
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/disko-partition-setup-on-uefi-system-which-has-fallback-boot-partition/51968/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: boot.loader.systemd-boot.mirroredBoots
9 participants