Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: arch: move arch_swap() declaration #82454

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

peter-mitsis
Copy link
Collaborator

Moves the arch_swap() declaration out of kernel_arch_interface.h and into the various architectures' kernel_arch_func.h. This permits the arch_swap() to be inlined on ARM, but extern'd on the other architectures that still implement arch_swap().

Inlining this function on ARM has shown at least a +5% performance boost according to the thread_metric benchmark on the disco_l475_iot1 board.

At the time of creating this PR, mainline results for the thread_metric w/ multiq on the disco_l475_iot1 ..
Preemptive: 7051317, Cooperative: 12436712
With this PR ...
Preemptive: 7417390, Cooperative: 13188390

The new preemptive numbers should put us a little ahead of ThreadX on the same hardware.

Copy link
Member

@aescolar aescolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok by me. Just a minor thing and a nit.

kernel/include/kernel_arch_interface.h Show resolved Hide resolved
arch/mips/include/kernel_arch_func.h Outdated Show resolved Hide resolved
@peter-mitsis peter-mitsis force-pushed the pmitsis-inline-arch_swap branch from dc33678 to dcaad72 Compare December 3, 2024 23:44
@peter-mitsis peter-mitsis requested a review from aescolar December 3, 2024 23:45
andyross
andyross previously approved these changes Dec 3, 2024
Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems very reasonable.

kernel/include/kernel_arch_interface.h Show resolved Hide resolved
@peter-mitsis
Copy link
Collaborator Author

Fixed for ARM builds.

@peter-mitsis
Copy link
Collaborator Author

Hmmm ... something weird is going on with the arch.arm.swap.common.fpu_sharing test on the mps2/an521/cpu1 board.

It fails with this patch, and passes without. However, the non-optimized version of the test (arch.arm.swap.common.fpu_sharing.no_optimizations) passes. Both scenarios are reproducible on my dev box. I am hoping that there is something off with the test, but I am digging into this to find out.

@peter-mitsis
Copy link
Collaborator Author

I am leaning towards thinking that this failure is a test failure. In the optimized version of this test (using -0s), we are getting garbage values for the v1..v8 registers, and some of the routines for initializing data are just absent. It looks like the compiler is optimizing them away. I think we some way to keep them in the code. Continuing to dig and experiment.

@peter-mitsis
Copy link
Collaborator Author

I think I see what is going on in the failing test now ...

In the failing test, it makes a call to arch_swap(). Previously, this would be a function call and the test was written for that. However, with arch_swap() now being inlined, this changes what the compiler generates just enough so that when alt_thread checks the registers, r5/v2 is 0 instead of what it had previously saved and is expecting.

I have an idea about how to work around this ...

Moves the arch_swap() declaration out of kernel_arch_interface.h
and into the various architectures' kernel_arch_func.h. This
permits the arch_swap() to be inlined on ARM, but extern'd on
the other architectures that still implement arch_swap().

Inlining this function on ARM has shown at least a +5% performance
boost according to the thread_metric benchmark on the disco_l475_iot1
board.

Signed-off-by: Peter Mitsis <[email protected]>
@peter-mitsis peter-mitsis force-pushed the pmitsis-inline-arch_swap branch from b4171dc to 2eb582a Compare December 4, 2024 22:56
@andyross
Copy link
Contributor

andyross commented Dec 4, 2024

FYI, "v1-v8" (which are IMHO needlessly confusing aliases for r4-r11) are the caller-save registers in the ARM ABI. It's likely that the earlier swap routine was written to assume that the caller had already spilled them and that they don't have to be saved. In which case you might as well give up for ARM; the routine wasn't written to be legally inlined.

I mean, one could "fix" it, but only at the cost of adding back all the spills the compiler was generating before. And that's worse and not better, as usually the compiler is really good about spill/fill logic (e.g. finding registers that don't actually need to be saved), where a context switch is forced to be conservative/pessimal and save everything.

@andyross
Copy link
Contributor

andyross commented Dec 4, 2024

Also, as far as rewriting ARM Cortex M swap: I already claim that spot as soon as I can get MTK work submitted SOF-side. Hold my beer, as it were.

@peter-mitsis
Copy link
Collaborator Author

Also, as far as rewriting ARM Cortex M swap: I already claim that spot as soon as I can get MTK work submitted SOF-side.

I am looking forward to your rewrite. I doubt that my proposed commit will have a long life as I anticipate your work to supersede it, but should it take longer than anticipated we at least have an interim boost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM ARM (32-bit) Architecture area: Kernel area: MIPS area: native port Host native arch port (native_sim) area: NIOS2 NIOS2 Architecture area: X86 x86 Architecture (32-bit)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants