You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The network stack may have many reasons for returning error codes, but one particularly annoying one is send errors on CARP advertisement packets. If the machine demotes itself due to a send error, the source code seems to suggest that the machine is not able to recover from this unless the demotion is manually lowered by an administrator:
Now this may or may not be intended behavior, but it's not documented.
This is especially relevant when a machine is booting, when userspace configures i.e. LAGGs or other systems that may influence outbound packet handling which may trigger CARP to react on these things, leaving a configured master machine in a backup state.
I have also seen cases where traffic flows normally through an interface, but the CARP advertisement packets specifically are encountering errors for unknown reasons, causing the machine to failover based on the wrong information.
A "permanent" workaround is to ignore send errors all together by setting net.inet.carp.senderr_demotion_factor to 0. However, I'm not sure what the impact of this will be. A machine in backup state will run a "master timeout" timer, giving a master up to 4+ seconds (to account for the error count of 3) to recover (https://github.com/opnsense/src/blob/stable/25.1/sys/netinet/ip_carp.c#L1376). If the send errors are valid, failover will now be slowed down significantly.
To Reproduce
N/A - not easy to reproduce.
Expected behavior
Either the CARP system should ignore send errors during the booting/userspace configuration stage, or it should automatically recover from send error demotions, both of which currently aren't happening
Describe alternatives you considered
N/A
Additional context
I do want to point out that this doesn't seem to happen very often, and may be very dependent on platform and configuration used.
The text was updated successfully, but these errors were encountered:
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
Describe the bug
The network stack may have many reasons for returning error codes, but one particularly annoying one is send errors on CARP advertisement packets. If the machine demotes itself due to a send error, the source code seems to suggest that the machine is not able to recover from this unless the demotion is manually lowered by an administrator:
The following logic:
https://github.com/opnsense/src/blob/stable/25.1/sys/netinet/ip_carp.c#L905-L916
requires a demoted machine (possibly to backup) to continue sending advertisements to notice if things are OK again, however, the advertisements are stopped in backup mode: https://github.com/opnsense/src/blob/stable/25.1/sys/netinet/ip_carp.c#L1375. Interestingly enough, it actually requires 3 send errors for this condition to trigger, meaning 3 packets were unable to exit the interface.
Now this may or may not be intended behavior, but it's not documented.
This is especially relevant when a machine is booting, when userspace configures i.e. LAGGs or other systems that may influence outbound packet handling which may trigger CARP to react on these things, leaving a configured master machine in a backup state.
I have also seen cases where traffic flows normally through an interface, but the CARP advertisement packets specifically are encountering errors for unknown reasons, causing the machine to failover based on the wrong information.
A "permanent" workaround is to ignore send errors all together by setting
net.inet.carp.senderr_demotion_factor
to0
. However, I'm not sure what the impact of this will be. A machine in backup state will run a "master timeout" timer, giving a master up to 4+ seconds (to account for the error count of 3) to recover (https://github.com/opnsense/src/blob/stable/25.1/sys/netinet/ip_carp.c#L1376). If the send errors are valid, failover will now be slowed down significantly.To Reproduce
N/A - not easy to reproduce.
Expected behavior
Either the CARP system should ignore send errors during the booting/userspace configuration stage, or it should automatically recover from send error demotions, both of which currently aren't happening
Describe alternatives you considered
N/A
Additional context
I do want to point out that this doesn't seem to happen very often, and may be very dependent on platform and configuration used.
The text was updated successfully, but these errors were encountered: