Load kernel BTF spec once when creating a new collection #1690

patrickpichler · 2025-02-17T18:54:30Z

When creating a new collection without specifying the KernelTypes in the ProgramOptions, the kernel BTF is implicitly loaded. For this the btf.LoadKernelSpec() helper is used. Even though the spec is only loaded once, this operation cause a lot of memory churn, as the kernel spec is being copied each time the function get called. This becomes an issue when a collection with lots of programs is loaded in a resource limited environment.

This commit initializes an missing KernelTypes field with the BTF kernel spec, getting rid of the subsequent calls of BTF.Copy().

This greatly reduces the number of allocations/op when running the BenchmarkNewCollectionManyProgs benchmark. Here are the results:

Previous:
BenchmarkNewCollectionManyProgs-4 12 98443414 ns/op 116779863 B/op 403964 allocs/op

With optimization:
BenchmarkNewCollectionManyProgs-4 184 5807742 ns/op 4134444 B/op 17325 allocs/op

When creating a new collection without specifying the `KernelTypes` in the `ProgramOptions`, the kernel BTF is implicitly loaded. For this the `btf.LoadKernelSpec()` helper is used. Even though the spec is only loaded once, this operation cause a lot of memory churn, as the kernel spec is being copied each time the function get called. This becomes an issue when a collection with lots of programs is loaded in a resource limited environment. This commit initializes an missing `KernelTypes` field with the BTF kernel spec, getting rid of the subsequent calls of `BTF.Copy()`. This greatly reduces the number of allocations/op when running the `BenchmarkNewCollectionManyProgs` benchmark. Here are the results: Previous: BenchmarkNewCollectionManyProgs-4 12 98443414 ns/op 116779863 B/op 403964 allocs/op With optimization: BenchmarkNewCollectionManyProgs-4 184 5807742 ns/op 4134444 B/op 17325 allocs/op Signed-off-by: Patrick Pichler <[email protected]>

patrickpichler · 2025-02-17T19:38:21Z

@lmb @dylandreimerink I know the test for kernels without BTF support are failing. I guess this can be fixed by lazy loading the kernel spec only once. Before I start refactoring the code though, is this approach something you would merge?

A different approach might be to refactor the btf.Spec type and somehow introduce a immutable version of it. I guess this is not really possible without introducing breaking changes though.

lmb · 2025-02-24T09:40:31Z

For this the btf.LoadKernelSpec() helper is used. Even though the spec is only loaded once, this operation cause a lot of memory churn, as the kernel spec is being copied each time the function get called.

IIRC we don't really copy the full spec. It's a shallow, lazy copy. Is that still too expensive?

This commit initializes an missing KernelTypes field with the BTF kernel spec, getting rid of the subsequent calls of BTF.Copy().

The problem with this is that it always causes a parse of kernel BTF, even in cases where the load wouldn't have needed one.

patrickpichler · 2025-02-24T13:50:48Z

IIRC we don't really copy the full spec. It's a shallow, lazy copy. Is that still too expensive?

Ok after looking at the flame graphs again, I think the issue is not directly the copying of the kernel BTF spec, but since each time relocations get applied, they call spec.TypeByID here, which if the BTF spec is shared will short circuit, as the type has already been copied, but for new BTF specs, it will always end up copying the same type over and over again. I misunderstood the copy logic the first time I look at this, causing me to suspect it is at fault, but since the kernels BTF spec always returns a copy, it should never have to deep copy copied types (there should be no copied types).

An alternative could be, to use the type directly from the imm part of the spec. This greatly reduces allocations, but I guess could be dangerous, as changes to types found in the instruction metadata would ripple through. Though I am not sure if there is a valid use case for this (just because I cannot think of it, of course doesn't mean there isn't 😅). I prototyped this here.

What do you think about this solution?

lmb · 2025-03-12T16:43:02Z

Sorry for the late reply. The culprit for you is always CO-RE relocation?

I think we could extend your approach to add an unexported method Spec.readOnlyTypeByID and use that from CO-RE. That function would either return an already copied type, or use the immutable type directly. It's not safe to export this since the caller has to pinky promise that they really won't modify the type. We also need to audit the CO-RE code to ensure that it doesn't modify the type somehow and in the best case even write a test for this. (Gut feeling: the test will be the hardest part.)

patrickpichler · 2025-03-13T12:26:36Z

No worries!

From what I've traced, CO-RE is causing the churn yeah. I guess the Spec.readOnlyTypeByID will in the end just be a wrapper around imm.typeByID, or do you have anything more ambitious in mind?

As for the test, what quickly comes to my mind would be to create a forced deep copy of the BTF spec (I guess a new method is needed here), load the program and then compare if the specs are the same. From what I see only CO-RE should be the issue, as the other place where BTF types are queried by IDs are in the btf-ext parser. It is not a problem here, since the same instance of the BTF spec is used during parsing, causing the types to only be cloned once.

lmb · 2025-03-17T12:45:13Z

I guess the Spec.readOnlyTypeByID will in the end just be a wrapper around imm.typeByID, or do you have anything more ambitious in mind?

It would have to:

Check mutableTypes.copies for a copied type and return that if it exists.
Return the immutable type.

create a forced deep copy of the BTF spec (I guess a new method is needed here), load the program and then compare if the specs are the same.

I think that could work, but maybe it's enough to copy Spec.types before and then compare only that slice? Not sure we need to check all the other members of Spec.

patrickpichler · 2025-03-20T07:16:05Z

Sounds good!
I will try to produce a PoC in the next comming days and report back with performance numbers 👍

patrickpichler force-pushed the reduce-number-of-allocations-when-creating-new-collection branch from 345d0ed to fed5496 Compare February 17, 2025 18:55

patrickpichler mentioned this pull request Feb 18, 2025

Reduce memory churn on module load by reusing kernel BTF spec castai/kvisor#487

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load kernel BTF spec once when creating a new collection #1690

Load kernel BTF spec once when creating a new collection #1690

patrickpichler commented Feb 17, 2025

patrickpichler commented Feb 17, 2025

lmb commented Feb 24, 2025

patrickpichler commented Feb 24, 2025

lmb commented Mar 12, 2025

patrickpichler commented Mar 13, 2025

lmb commented Mar 17, 2025

patrickpichler commented Mar 20, 2025

Load kernel BTF spec once when creating a new collection #1690

Are you sure you want to change the base?

Load kernel BTF spec once when creating a new collection #1690

Conversation

patrickpichler commented Feb 17, 2025

patrickpichler commented Feb 17, 2025

lmb commented Feb 24, 2025

patrickpichler commented Feb 24, 2025

lmb commented Mar 12, 2025

patrickpichler commented Mar 13, 2025

lmb commented Mar 17, 2025

patrickpichler commented Mar 20, 2025