Skip to content
This repository has been archived by the owner on Mar 15, 2023. It is now read-only.

Add Tracepoint (Kprobe/Kretprobe) interface in #63

Open
nathanjsweet opened this issue Sep 6, 2018 · 6 comments
Open

Add Tracepoint (Kprobe/Kretprobe) interface in #63

nathanjsweet opened this issue Sep 6, 2018 · 6 comments

Comments

@nathanjsweet
Copy link
Member

What should ebpf do?
eBPF should expose some general purpose functions for creating kernel tracepoints with bpf

Why should ebpf do this?
The abstraction is powerful and requires minimal code, testing examples, etc will be helpful documentation to folks using this library.

Additional context
It used to be in the library here:
https://github.com/newtools/ebpf/pull/62/files#diff-fafe0b379191c3838559e4573260057aL1
But it was taken out because of no tests.

@ti-mo
Copy link

ti-mo commented Jan 14, 2019

Hi, has this functionality been implemented anywhere else by now? Currently using https://github.com/iovisor/gobpf, but looking to get rid of cgo, and this project looks good!

I have a functioning package that ships multiple ELF binaries with k(ret)probes that send events back to userspace over perf channels. It already has a small integration test suite, so I could help in contributing a similar suite here. Please let me know if this is something you'd be interested in.

Attaching kprobes and creating tracepoints could live in a separate package, but it would make more sense having it gathered and integrated as part of this one. As you stated, it's a relatively small amount of code.

What was the obstacle to building coverage for this at the time? I guess we need to settle on a kernel function that's unlikely to be renamed and that we can reliably trigger from a program. If the probe successfully sends a perf event, it can be considered working.

In the PR, @lmb mentioned:

It looks like the API leaks the event FD, so there is no way to remove events except restarting the process

Do you mean detaching kprobes or removing tracepoints? I believe that can be done as follows: https://github.com/iovisor/gobpf/blob/master/elf/module.go#L449.

@nathanjsweet
Copy link
Member Author

Hey @ti-mo. The only thing blocking something like this is comprehensive testing. If that is something you can provide we would welcome your PR. If you look at the above referenced branch it is a good starting place.

I think it can live in this repo, but I think it would be best if it were a sub-folder (maybe called “tracing” or something like that?). Let me know if you need anything else.

@lmb
Copy link
Collaborator

lmb commented Jan 15, 2019

I agree with Nathan, a subpackage sounds fine.

I think that adding the syscall support to attach / detach eBPF to tracepoints, etc. is only a small part of the necessary work, unfortunately. Right now, the eBPF for kprobes / tracepoints is specific to a kernel configuration, since the layout of important data structures change. This is why bcc always ships the full source code and has written a LLVM plugin to make writing these programs easier.

I think a possible solution is to support the BPF type format (BTF), which is supposed to let the eBPF verifier handle these differences. That is still a ways off, though.

@ti-mo
Copy link

ti-mo commented Jan 15, 2019

@lmb Indeed, and since I'm bringing my own binaries anyway, that's all I need. 🙂 The syscalls themselves shouldn't be too difficult to test.

To work around the issues you mentioned, I'm using a hybrid of the approach taken in https://github.com/weaveworks/tcptracer-bpf. I pre-build ELF binaries against multiple kernel versions using LLVM's bpf target and ship them in the package using statik. At runtime, I pick a compatible one from the catalog based on the running kernel version of the machine. I also scan /proc/kallsyms to make sure the functions I want to probe are loaded ahead of time (conntrack, so it's usually a kmod). I also know exactly which build-time kernel config vars I depend on, so I check those too (from /proc/config.gz or other). All major distro's have the flags I need enabled, and I would expect someone that builds a custom kernel to be able to build the package against their custom config as well. Would BTF remove the need for some of these steps?

As long as we're not navigating kernel structs in the test suite kprobe, we don't depend on the struct layout. The only dependency is the presence of a kernel symbol that's unlikely (or impossible) to be compiled as a kmod, and that we can reliably trigger. The kprobe can simply send a perf event containing a single byte (or none at all?) without walking any kernel memory. This would allow us to perform an end-to-end test that is portable, as long as we can find a low-churn part of the kernel, with a kernel symbol that is unlikely to be changed. This problem is inherent to what we're trying to do, a kprobe is always attached to a named symbol, don't see any other options there.

Of course, my experience with BPF is fairly limited, but it's another perspective regardless.

@lmb
Copy link
Collaborator

lmb commented Jan 15, 2019

To work around the issues you mentioned, I'm using a hybrid of the approach taken in https://github.com/weaveworks/tcptracer-bpf. I pre-build ELF binaries against multiple kernel versions using LLVM's bpf target and ship them in the package using statik. At runtime, I pick a compatible one from the catalog based on the running kernel version of the machine. I also scan /proc/kallsyms to make sure the functions I want to probe are loaded ahead of time (conntrack, so it's usually a kmod).

That's pretty cool! What happens if you try to instrument a function that doesn't exist?

I also know exactly which build-time kernel config vars I depend on, so I check those too (from /proc/config.gz or other). All major distro's have the flags I need enabled, and I would expect someone that builds a custom kernel to be able to build the package against their custom config as well.

How do you make sure that the following works:

struct sock {
  int a;
#if defined(FEATURE_YOU_DONT_CHECK)
  int b;
#endif
 ...
#if defined(FEATURE_YOU_DO_CHECK)
  int c;
#endif
}

Here the offset of c changes even though FEATURE_YOU_DO_CHECK is enabled. Do you compare the config byte by byte?

Would BTF remove the need for some of these steps?

That's what I understand, yeah. http://vger.kernel.org/netconf2018_files/AlexeiStarovoitov_netconf2018.pdf mentions this in the appendix (see "BTF update").

As long as we're not navigating kernel structs in the test suite kprobe, we don't depend on the struct layout. The only dependency is the presence of a kernel symbol that's unlikely (or impossible) to be compiled as a kmod, and that we can reliably trigger. The kprobe can simply send a perf event containing a single byte (or none at all?) without walking any kernel memory. This would allow us to perform an end-to-end test that is portable, as long as we can find a low-churn part of the kernel, with a kernel symbol that is unlikely to be changed. This problem is inherent to what we're trying to do, a kprobe is always attached to a named symbol, don't see any other options there.

I think tracepoints have a stable ABI, so maybe that is easier (provided kprobe and tracepoints use the same mechanisms, not sure). From my POV it would be enough to just make sure the kprobe load / unload syscalls are successful, no need to send a perf event.

@ti-mo
Copy link

ti-mo commented Jan 15, 2019

What happens if you try to instrument a function that doesn't exist?

Writing a non-existent (or unloaded) symbol to /sys/kernel/debug/tracing/kprobe_events will simply fail.

How do you make sure that the following works: ...

Annoyingly, by testing it on a wide range of distros. 🙂 For my use case, there are only 2 major structs I need to walk, nf_conn and net. These are fairly stable, but there have been changes over the range of kernel versions I'm trying to support, mostly when features were added. Most distributions have all flags on those structs turned on, but yes, there's never a guarantee.

The probe itself will always just work, there's no illegal memory access, but if the offsets don't match, you'll just read garbage. Luckily, every field in the perf event can be compared against known (or predictable) values, so it can be reliably tested. I'm starting to realize this might only be the case in a very, very small percentage of use cases, though.

The big trade-off with this approach is, of course, the lack of forward-compatibility. I can't predict future changes to these structs, but fortunately, the major distros tend pick a kernel and maintain it for 5 years without major changes. These structs tend to change around minor kernel releases, so adding an ELF binary every couple of months isn't that big of a deal (yet).

It's a start, it works for now. 🙂 Thanks for the BTF reference, looks really promising, and can't wait for this to evolve.

I think tracepoints have a stable ABI, so maybe that is easier (provided kprobe and tracepoints use the same mechanisms, not sure). From my POV it would be enough to just make sure the kprobe load / unload syscalls are successful, no need to send a perf event.

Yup, kprobes effectively become tracepoints when attached, just a couple of extra steps to hook them up and take a reference to them. If testing the syscalls is sufficient, then we can simply bring back the (now-removed) code with some minor changes in a separate and test it with an empty kprobe. I'll try to get to that in the coming weeks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants