-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix:incomplete arp cause busy loop (high cpu util) #7421
Conversation
Incomplete ARP entry (nil mac address)can cause the busy loop
@mgleung Can you have a look this, many thanks:) |
cc @StevenTigera |
Co-authored-by: Shaun Crampton <[email protected]>
@fasaxc thanks for reviewing :) |
/sem-approve |
@detailyang Looks like the linter failed, please can you run |
@fasaxc Can you restart the CI again. It's failed at See https://tigera.semaphoreci.com/jobs/20627b73-30d4-4312-85da-0010d847a343/plain_logs.txt Generating manifest from charts/values/./canal-etcd.yaml
walk.go:74: found symbolic link in path: /home/semaphore/calico/charts/calico/crds resolves to /home/semaphore/calico/libcalico-go/config/crd. Contents of linked file included and used
make[2]: Leaving directory '/home/semaphore/calico'
make[1]: Leaving directory '/home/semaphore/calico'
make check-dirty
make[1]: Entering directory '/home/semaphore/calico'
The following files are dirty
felix/routetable/route_table.go | 4 �[32m++�[m�[31m--�[m
1 file changed, 2 insertions(+), 2 deletions(-)
make[1]: *** [lib.Makefile:1197: check-dirty] Error 1 |
/sem-approve |
@detailyang did you commit/push the result of |
fixed. |
/sem-approve |
felix/routetable/route_table.go
Outdated
@@ -1035,6 +1035,10 @@ func (r *RouteTable) syncL2RoutesForLink(ifaceName string) error { | |||
var updatesFailed bool | |||
|
|||
for _, existing := range existingNeigh { | |||
if existing.HardwareAddr == nil { | |||
log.WithField("entry", entry).Debug("Ignoring existing ARP entry with no hardware addr") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI now failing with entry
undefined
Signed-off-by: detailyang <[email protected]>
…atch-1 Signed-off-by: detailyang <[email protected]>
@fasaxc |
would you backport lower version ? |
/sem-approve |
Description
Hello.
we have identified an issue with incomplete ARP entries that may cause a busy loop for calico-node in production.
We are unsure why every node is generating incomplete ARP entries, as shown in the attached image
These entries are causing the calico-node to experience high CPU utilization and become stuck in the syncL2RoutesForLink process. Specifically, the incomplete ARP entry, which holds a nil MAC address, is causing the kernel function rtnl_fdb_del to fail when attempting to remove the FDB entry
https://elixir.bootlin.com/linux/v3.10.108/source/net/core/rtnetlink.c#L2223
As a workaround, we can choose to ignore the incomplete ARP entries for now and continue with normal operations :)
fixes #5460
Related issues/PRs
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*
label.docs-pr-required
: This change requires a change to the documentation that has not been completed yet.docs-completed
: This change has all necessary documentation completed.docs-not-required
: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*
label.release-note-required
: This PR has user-facing changes. Most PRs should have this label.release-note-not-required
: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate
: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr
: This PR is related to install and requires a corresponding change to the operator.