Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unload USB XHCI driver at shutdown on hardware #1335

Merged

Conversation

sysvinit
Copy link
Member

@sysvinit sysvinit commented Mar 11, 2025

Since upgrading to 24.11, we've observed that some of our storage servers with certain motherboard revisions don't deconfigure USB devices properly at shutdown, which results in the LUKS key stick getting lost and not reappearing on the bus again after a reboot. This means that these storage servers aren't able to automatically decrypt their data disks when performing maintenance.

This change introduces a workaround to unload the XHCI driver from the kernel very late in the shutdown process just before userland ends, which appears to avoid this problem.

PL-133421

@flyingcircusio/release-managers

Release process

  • Created changelog entry using ./changelog.sh

PR release workflow (internal)

  • PR has internal ticket
  • internal issue ID (PL-…) part of branch name
  • internal issue ID mentioned in PR description text
  • ticket is on Platform agile board
  • ticket state set to Pull request ready
  • if ticket is more urgent than within the next few days, directly contact a member of the Platform team

Design notes

  • Provide a feature toggle if the change might need to be adjusted/reverted quickly depending on context. Consider whether the default should be on or off. Example: rate limiting.
    • This change sets the upstream NixOS systemd.shutdown option to configure the script which removes the kernel module at shutdown time. This can be disabled by overriding the NixOS option in the local host configuration.
  • All customer-facing features and (NixOS) options need to be discoverable from documentation. Add or update relevant documentation such that hosted and guided customers can understand it as well.

Security implications

  • Security requirements defined? (WHERE)
    • This only runs at shutdown time, and has no impact on the running system.
  • Security requirements tested? (EVIDENCE)
    • Experimentally verified with several of the production Ceph hosts which were having the issue of losing their key sticks.

This works around a problem where certain hardware and kernel
combinations don't deconfigure USB devices properly at shutdown, which
can result in those devices not reappearing after a reboot.

PL-133421
@sysvinit sysvinit requested a review from ctheune March 11, 2025 16:40
@sysvinit sysvinit added risk: 1 very low risk urgency: 4 high urgency labels Mar 11, 2025
Copy link

This PR is ready to merge. Merge scheduled for 2025-03-11

@platform-pr-manager platform-pr-manager bot merged commit ad37fde into fc-24.11-dev Mar 11, 2025
5 of 6 checks passed
@platform-pr-manager platform-pr-manager bot deleted the PL-133421-shutdown-unload-xhci-driver branch March 11, 2025 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
risk: 1 very low risk urgency: 4 high urgency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants