Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a path for BPF-accelerated async signal emulation. #3731

Merged
merged 1 commit into from
Jun 26, 2024

Conversation

khuey
Copy link
Collaborator

@khuey khuey commented Apr 22, 2024

Starting in kernel 6.10 BPF filters can choose whether or not to trigger the SIGIO behavior for a perf event that becomes readable. We combine that with a hardware breakpoint and a BPF filter that matches the GPRs to produce an accelerated internal breakpoint type that can fast forward through loop iterations to deliver async signals. On one trace this reduced rr's replay overhead by 94%.

This adds a runtime dependency on libbpf and a compile time dependency on clang --target bpf. rr also needs CAP_BPF and CAP_PERFMON to use this feature. Because of all of that, this isn't really suitable for wide use at this point and is instead a CMake feature usebpf. Set -Dusebpf=ON to test it.

(I think we should wait until the kernel side hits Linus's tree to merge this.)

@khuey khuey requested a review from rocallahan April 22, 2024 01:39
@khuey khuey force-pushed the bpf_async_signal branch from 7b620cc to 743aafe Compare May 15, 2024 15:46
CMakeLists.txt Outdated Show resolved Hide resolved
static struct user_regs_struct* bpf_regs;

if (!fd_async_signal_accelerator.is_open()) {
if (!initialized) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this BPF initialization code into its own function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feel a bit ugly to be mashing the BPF program's global state in this function. And it's ugly to be mmapping that buffer and then leaking it to the global variable.

How hard would it be to put the BPF program and its state into its own class with proper ownership, and have each ReplaySession hold a shared pointer to an object of that class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I reorganized this along those lines. The bpf singleton stuff lives in a BpfAccelerator class that's shared between the different PerfCounters instances.

src/PerfCounters.h Outdated Show resolved Hide resolved
src/ReplaySession.cc Show resolved Hide resolved
src/bpf/async_event_filter.c Outdated Show resolved Hide resolved
src/bpf/async_event_filter.c Outdated Show resolved Hide resolved
@khuey khuey force-pushed the bpf_async_signal branch from 61a216b to 9237959 Compare May 26, 2024 19:02
@khuey khuey requested a review from rocallahan May 27, 2024 01:39
src/PerfCounters.cc Outdated Show resolved Hide resolved
src/PerfCounters.cc Show resolved Hide resolved
src/PerfCounters.cc Outdated Show resolved Hide resolved
src/PerfCounters.cc Show resolved Hide resolved

class BpfAccelerator {
public:
static std::shared_ptr<BpfAccelerator> get_or_create();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we could just create one BpfAccelerator in ReplaySession and copy the reference when we clone ReplaySessions so we don't need a static variable here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this is a great idea. It means moving BpfAccelerator into the header so ReplaySession can get at it. Is that really better than a static singleton?

src/bpf/async_event_filter.c Outdated Show resolved Hide resolved
@khuey khuey force-pushed the bpf_async_signal branch from e272850 to fccb968 Compare May 30, 2024 17:38
@khuey khuey requested a review from rocallahan May 30, 2024 17:40
Starting in kernel 6.10 BPF filters can choose whether or not to trigger
the SIGIO behavior for a perf event that becomes readable. We combine that
with a hardware breakpoint and a BPF filter that matches the GPRs to produce
an accelerated internal breakpoint type that can fast forward through loop
iterations to deliver async signals. On one trace this reduced rr's replay
overhead by 94%.

This adds a runtime dependency on libbpf and a compile time dependency on
clang --target bpf. rr also needs CAP_BPF and CAP_PERFMON to use this feature.
Because of all of that, this isn't really suitable for wide use at this point
and is instead a CMake feature usebpf. Set -Dusebpf=ON to test it.
@khuey khuey force-pushed the bpf_async_signal branch from 013b30d to 1ac134c Compare June 26, 2024 16:38
@khuey khuey merged commit e7d9e8f into rr-debugger:master Jun 26, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants