Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add patch for FRR with regression fix for MAC mobility #1321

Merged
merged 1 commit into from
Mar 4, 2025

Conversation

sysvinit
Copy link
Member

@sysvinit sysvinit commented Mar 4, 2025

This change introduces a patch to our pinned FRR version which fixes a regression affecting MAC mobility (and hence VM migration) introduced between 10.0 and 10.1.

Depending on ordering between different hosts, a receiving KVM host may learn the MAC address of a migrating guest before the sending host stops announcing it. Due to the regression, bgpd would frequently decide that the remote route was a "better" route than the locally learned route and thus delete the latter, which would mean that when the remote route was withdrawn when the migration completed, bgpd would not announce a route for the locally learned MAC.

This would lead to routes for guest MAC addresses going missing, causing unnecessary flooding from the other hosts in the EVPN fabric and eventually to site-wide network performance degradation.

This regression has been fixed upstream and backported to the stable branch, but is yet to appear in a 10.1.x stable release.

PL-133422

@flyingcircusio/release-managers

Release process

  • Created changelog entry using ./changelog.sh

PR release workflow (internal)

  • PR has internal ticket
  • internal issue ID (PL-…) part of branch name
  • internal issue ID mentioned in PR description text
  • ticket is on Platform agile board
  • ticket state set to Pull request ready
  • if ticket is more urgent than within the next few days, directly contact a member of the Platform team

Design notes

  • Provide a feature toggle if the change might need to be adjusted/reverted quickly depending on context. Consider whether the default should be on or off. Example: rate limiting.
  • All customer-facing features and (NixOS) options need to be discoverable from documentation. Add or update relevant documentation such that hosted and guided customers can understand it as well.

Security implications

  • Security requirements defined? (WHERE)
    • This is a recognised regression upstream, and while the fix has been merged it's not yet in a public release.
  • Security requirements tested? (EVIDENCE)
    • Passes the chaos monkey smoke test in dev (which the unpatched version does not pass).

@osnyx osnyx merged commit 2412d64 into fc-24.11-dev Mar 4, 2025
2 checks passed
@osnyx osnyx deleted the PL-133422-frr-regression-fix branch March 4, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants