-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a path for BPF-accelerated async signal emulation. #3731
Conversation
src/PerfCounters.cc
Outdated
static struct user_regs_struct* bpf_regs; | ||
|
||
if (!fd_async_signal_accelerator.is_open()) { | ||
if (!initialized) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving this BPF initialization code into its own function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feel a bit ugly to be mashing the BPF program's global state in this function. And it's ugly to be mmapping that buffer and then leaking it to the global variable.
How hard would it be to put the BPF program and its state into its own class with proper ownership, and have each ReplaySession hold a shared pointer to an object of that class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I reorganized this along those lines. The bpf singleton stuff lives in a BpfAccelerator class that's shared between the different PerfCounters instances.
|
||
class BpfAccelerator { | ||
public: | ||
static std::shared_ptr<BpfAccelerator> get_or_create(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we could just create one BpfAccelerator
in ReplaySession
and copy the reference when we clone ReplaySession
s so we don't need a static variable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this is a great idea. It means moving BpfAccelerator
into the header so ReplaySession
can get at it. Is that really better than a static singleton?
Starting in kernel 6.10 BPF filters can choose whether or not to trigger the SIGIO behavior for a perf event that becomes readable. We combine that with a hardware breakpoint and a BPF filter that matches the GPRs to produce an accelerated internal breakpoint type that can fast forward through loop iterations to deliver async signals. On one trace this reduced rr's replay overhead by 94%. This adds a runtime dependency on libbpf and a compile time dependency on clang --target bpf. rr also needs CAP_BPF and CAP_PERFMON to use this feature. Because of all of that, this isn't really suitable for wide use at this point and is instead a CMake feature usebpf. Set -Dusebpf=ON to test it.
Starting in kernel 6.10 BPF filters can choose whether or not to trigger the SIGIO behavior for a perf event that becomes readable. We combine that with a hardware breakpoint and a BPF filter that matches the GPRs to produce an accelerated internal breakpoint type that can fast forward through loop iterations to deliver async signals. On one trace this reduced rr's replay overhead by 94%.
This adds a runtime dependency on libbpf and a compile time dependency on clang --target bpf. rr also needs CAP_BPF and CAP_PERFMON to use this feature. Because of all of that, this isn't really suitable for wide use at this point and is instead a CMake feature usebpf. Set -Dusebpf=ON to test it.
(I think we should wait until the kernel side hits Linus's tree to merge this.)