Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load kernel BTF spec once when creating a new collection #1690

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

patrickpichler
Copy link
Contributor

When creating a new collection without specifying the KernelTypes in the ProgramOptions, the kernel BTF is implicitly loaded. For this the btf.LoadKernelSpec() helper is used. Even though the spec is only loaded once, this operation cause a lot of memory churn, as the kernel spec is being copied each time the function get called. This becomes an issue when a collection with lots of programs is loaded in a resource limited environment.

This commit initializes an missing KernelTypes field with the BTF kernel spec, getting rid of the subsequent calls of BTF.Copy().

This greatly reduces the number of allocations/op when running the BenchmarkNewCollectionManyProgs benchmark. Here are the results:

Previous:
BenchmarkNewCollectionManyProgs-4 12 98443414 ns/op 116779863 B/op 403964 allocs/op

With optimization:
BenchmarkNewCollectionManyProgs-4 184 5807742 ns/op 4134444 B/op 17325 allocs/op

When creating a new collection without specifying the `KernelTypes` in
the `ProgramOptions`, the kernel BTF is implicitly loaded. For this the
`btf.LoadKernelSpec()` helper is used. Even though the spec is only
loaded once, this operation cause a lot of memory churn, as the kernel
spec is being copied each time the function get called. This becomes an
issue when a collection with lots of programs is loaded in a resource
limited environment.

This commit initializes an missing `KernelTypes` field with the BTF
kernel spec, getting rid of the subsequent calls of `BTF.Copy()`.

This greatly reduces the number of allocations/op when running the
`BenchmarkNewCollectionManyProgs` benchmark. Here are the results:

Previous:
BenchmarkNewCollectionManyProgs-4  12 98443414 ns/op 116779863 B/op 403964 allocs/op

With optimization:
BenchmarkNewCollectionManyProgs-4 184  5807742 ns/op   4134444 B/op  17325 allocs/op

Signed-off-by: Patrick Pichler <[email protected]>
@patrickpichler patrickpichler force-pushed the reduce-number-of-allocations-when-creating-new-collection branch from 345d0ed to fed5496 Compare February 17, 2025 18:55
@patrickpichler
Copy link
Contributor Author

@lmb @dylandreimerink I know the test for kernels without BTF support are failing. I guess this can be fixed by lazy loading the kernel spec only once. Before I start refactoring the code though, is this approach something you would merge?

A different approach might be to refactor the btf.Spec type and somehow introduce a immutable version of it. I guess this is not really possible without introducing breaking changes though.

@lmb
Copy link
Collaborator

lmb commented Feb 24, 2025

For this the btf.LoadKernelSpec() helper is used. Even though the spec is only loaded once, this operation cause a lot of memory churn, as the kernel spec is being copied each time the function get called.

IIRC we don't really copy the full spec. It's a shallow, lazy copy. Is that still too expensive?

This commit initializes an missing KernelTypes field with the BTF kernel spec, getting rid of the subsequent calls of BTF.Copy().

The problem with this is that it always causes a parse of kernel BTF, even in cases where the load wouldn't have needed one.

@patrickpichler
Copy link
Contributor Author

IIRC we don't really copy the full spec. It's a shallow, lazy copy. Is that still too expensive?

Ok after looking at the flame graphs again, I think the issue is not directly the copying of the kernel BTF spec, but since each time relocations get applied, they call spec.TypeByID here, which if the BTF spec is shared will short circuit, as the type has already been copied, but for new BTF specs, it will always end up copying the same type over and over again. I misunderstood the copy logic the first time I look at this, causing me to suspect it is at fault, but since the kernels BTF spec always returns a copy, it should never have to deep copy copied types (there should be no copied types).

An alternative could be, to use the type directly from the imm part of the spec. This greatly reduces allocations, but I guess could be dangerous, as changes to types found in the instruction metadata would ripple through. Though I am not sure if there is a valid use case for this (just because I cannot think of it, of course doesn't mean there isn't 😅). I prototyped this here.

What do you think about this solution?

@lmb
Copy link
Collaborator

lmb commented Mar 12, 2025

Sorry for the late reply. The culprit for you is always CO-RE relocation?

I think we could extend your approach to add an unexported method Spec.readOnlyTypeByID and use that from CO-RE. That function would either return an already copied type, or use the immutable type directly. It's not safe to export this since the caller has to pinky promise that they really won't modify the type. We also need to audit the CO-RE code to ensure that it doesn't modify the type somehow and in the best case even write a test for this. (Gut feeling: the test will be the hardest part.)

@patrickpichler
Copy link
Contributor Author

No worries!

From what I've traced, CO-RE is causing the churn yeah. I guess the Spec.readOnlyTypeByID will in the end just be a wrapper around imm.typeByID, or do you have anything more ambitious in mind?

As for the test, what quickly comes to my mind would be to create a forced deep copy of the BTF spec (I guess a new method is needed here), load the program and then compare if the specs are the same. From what I see only CO-RE should be the issue, as the other place where BTF types are queried by IDs are in the btf-ext parser. It is not a problem here, since the same instance of the BTF spec is used during parsing, causing the types to only be cloned once.

@lmb
Copy link
Collaborator

lmb commented Mar 17, 2025

I guess the Spec.readOnlyTypeByID will in the end just be a wrapper around imm.typeByID, or do you have anything more ambitious in mind?

It would have to:

  1. Check mutableTypes.copies for a copied type and return that if it exists.
  2. Return the immutable type.

create a forced deep copy of the BTF spec (I guess a new method is needed here), load the program and then compare if the specs are the same.

I think that could work, but maybe it's enough to copy Spec.types before and then compare only that slice? Not sure we need to check all the other members of Spec.

@patrickpichler
Copy link
Contributor Author

Sounds good!
I will try to produce a PoC in the next comming days and report back with performance numbers 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants