Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-10975] Add QoS packet rate limit #9947

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

coutinhop
Copy link
Member

Description

Add packet rate limiting QoS control implementation, using iptables (using '-m limit' and a new mark) and nftables (using 'limit rate over') rules to limit ingress and/or egress packet rate based on workload QoSControls (from pod annotations in k8s).

Related issues/PRs

Todos

  • Tests
  • Documentation
  • Release note

Release Note

Add packet rate limiting QoS control to workloads.

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

Add packet rate limiting QoS control implementation, using iptables
(using '-m limit' and a new mark) and nftables (using 'limit rate over')
rules to limit ingress and/or egress packet rate based on workload
QoSControls (from pod annotations in k8s).
@coutinhop coutinhop added release-note-required Change has user-facing impact (no matter how small) docs-not-required Docs not required for this change labels Mar 7, 2025
@coutinhop coutinhop requested review from fasaxc and nelljerram March 7, 2025 15:36
@coutinhop coutinhop self-assigned this Mar 7, 2025
@coutinhop coutinhop requested a review from a team as a code owner March 7, 2025 15:36
@marvin-tigera marvin-tigera added this to the Calico v3.30.0 milestone Mar 7, 2025
Copy link
Member

@nelljerram nelljerram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just a few minor points to address.

In terms of dataplane support - something that I should have checked before! -

  • Is it correct that we're supporting iptables and nftables, but not eBPF?
  • Is that the case for the bandwidth control as well?

@@ -123,6 +123,8 @@ func StartDataplaneDriver(
markScratch0, _ = markBitsManager.NextSingleBitMark()
markScratch1, _ = markBitsManager.NextSingleBitMark()

markLimitPacketRate, _ = markBitsManager.NextSingleBitMark()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be absolutely fine, but can you check with the relevant people - probably @fasaxc and @tomastigera - in case we need to be more careful with using up new mark bits? It might be possible to reuse an existing one, if the places where they are used, for different features, do not overlap with each other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering we invariably use the mark in 2 consecutive iptables rules, I suppose we could reuse an existing 'scratch' mark instead of allocating a new one, but then I think we'd need a 3rd rule to clear the mark (considering it could/would be used for other purposes as well, as we're only using the mark to "bypass" the iptables -m limit module limitation of not being able to drop packets above the limit, we could clear the mark in a rule immediately after the one dropping the packets without the mark). @nelljerram do you think that's a better approach?

}

It("should limit packet rate correctly", func() {
format.MaxLength = 1000000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to affect all following tests. Personally, I'm a fan of putting format.MaxLength = 0 somewhere, like in libcalico-go/lib/clientv3/tier_e2e_test.go:

func init() {
	// Stop Gomega from chopping off diffs in logs.
	format.MaxLength = 0
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(IOW, my only concern is that it looks like this might be a local-only setting, but in fact it won't be.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! I blindly copied what happened in another test case I saw, but I agree this should be improved. Will do

By("Running iperf3 client on workload 1")
out, err := w[1].ExecOutput("iperf3", "-c", w[0].IP, "-O5", "-M1000", "-J")
Expect(err).NotTo(HaveOccurred())
baselineRate, err := getRateFromJsonOutput(out)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, this is in bits per sec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is noted in the Info log a few lines below it:

log.Infof("iperf client rate with no packet rate limit (bps): %v", baselineRate)

should I add a comment in addition to that?

// Expect the baseline rate to be much greater (>=100x) the bandwidth that we
// would get with the packet rate we are going to configure just below. In
// practice we see several Gbps here.
Expect(baselineRate).To(BeNumerically(">=", 800000.0*100))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a connection between 800000 here and IngressPacketRate: 100 below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly a connection, just expecting 100x+ faster traffic with no limits than when limited. I'll add a comment for how we arrived at 800000 here too (as below)


By("Waiting for the config to appear in 'iptables-save' on workload 0")
// ingress config should be present
Eventually(getRules(0), "10s", "1s").Should(And(MatchRegexp(`-A cali-tw-`+regexp.QuoteMeta(w[0].InterfaceName)+` .* -m comment --comment "Mark packets within ingress packet rate limit" -m limit --limit `+regexp.QuoteMeta("100/sec")+` -j MARK --set-xmark 0x\d+/0x\d+`), MatchRegexp(`-A cali-tw-`+regexp.QuoteMeta(w[0].InterfaceName)+` .* -m comment --comment "Drop packets over ingress packet rate limit" -m mark ! --mark 0x\d+/0x\d+ -j DROP`)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this check, iptables-save -c will be called only once, returning a string, and then that same string will be analysed up to 10 times. If it fails the first time, it won't make any difference to keep trying for the next 10s.

What you really want is:

			getRules := func(felixId int) func() string {
                                return func() {
				out, err := tc.Felixes[felixId].ExecOutput("iptables-save", "-c")
				log.Infof("iptables-save -c output:\n%v", out)
				Expect(err).NotTo(HaveOccurred())
				return out
			}
			}

ingressLimitedRate, err := getRateFromJsonOutput(out)
Expect(err).NotTo(HaveOccurred())
log.Infof("iperf client rate with ingress packet rate limit on server (bps): %v", ingressLimitedRate)
// Expect the limited rate to be below an estimated desired rate (1000 byte packets * 8 bits/byte * 100 packets/s = 800000 bps)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice. Please explain that relation above as well.

Expect(err).NotTo(HaveOccurred())
log.Infof("iperf client rate with ingress packet rate limit on server (bps): %v", ingressLimitedRate)
// Expect the limited rate to be below an estimated desired rate (1000 byte packets * 8 bits/byte * 100 packets/s = 800000 bps)
Expect(ingressLimitedRate).To(BeNumerically("<=", 800000.0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a bit of leeway here, e.g. * 1.2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, this upperbound was never even close to hitting when I ran the test (the max was typically 640kbps), but I suppose it doesn't hurt to have some margin, will add!

@@ -1359,6 +1369,82 @@ var _ = Describe("Endpoints", func() {
},
}))
})

It("should render a workload endpoint with packet rate limiting QoSControls", func() {
format.MaxLength = 1000000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about this as in FV test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I think this showed up after running make generate, will remove

@coutinhop
Copy link
Member Author

Looks great. Just a few minor points to address.

In terms of dataplane support - something that I should have checked before! -

* Is it correct that we're supporting iptables and nftables, but not eBPF?

* Is that the case for the bandwidth control as well?

Thanks so much! Yes, currently all 3 planned controls will only support iptables and nftables, but not eBPF. This control (and the future connection limit control) relies on iptables/nftables itself, and will need to be implemented differently on eBPF. The bandwidth control has an incompatibility between the tc qdisc that the control needs and one that the eBPF dataplane uses to attach BPF programs to interfaces. So yes, in short, this will support iptables/nftables, but not eBPF (at least for now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required Docs not required for this change release-note-required Change has user-facing impact (no matter how small)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants