Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new(driver): ia32 emulation bpf drivers support #1196

Merged
merged 39 commits into from
Oct 11, 2023
Merged

Conversation

FedeDP
Copy link
Contributor

@FedeDP FedeDP commented Jul 12, 2023

What type of PR is this?

/kind feature

Any specific area of the project related to this PR?

/area driver-kmod
/area driver-bpf
/area driver-modern-bpf
/area libscap-engine-bpf
/area libscap-engine-modern-bpf

Does this PR require a change in the driver versions?

/version driver-API-version-major

What this PR does / why we need it:

This PR introduces support for x86_64 ia32 emulation for bpf drivers.
Note: we only really supported compat executables for x86_64; this is now more explicit in the code.
This PR aims at feature parity for bpf drivers with kmod.

Future work, that won't be addressed by this PR:

  • Add compat support for other supported architectures (? is this needed at all?)
  • Find a way to run drivers test as compat executables -> see new(driver): ia32 emulation bpf drivers support #1196 (comment)
  • Add a new execve flag and a filtercheck to let userspace know that a process is running as compat (this should be fairly simple since when a 64 bit process, ie: bash, spawns a new 32bit process, the execve exit event is sent as 64bit, of course, but the task already has the compat flag; in that situation we should make sure to push a new execve flag that will be managed and exposed as a filtercheck by userspace)
  • push a syscall32 generic event when the 32bit syscall has no 64bit mapping (the -1 values in the mapping array). Or add a new is32bit bool parameter to generic syscall event.

Still TODO:

Which issue(s) this PR fixes:

Fixes #279

Special notes for your reviewer:

Most of the line changes are from the new table (driver/syscall_ia32_64_map.c) that is ~500 locs.
The PR is not really big, main idea is to convert the 32bit syscalls to 64bit ones, so that we can normally manage them without the need of a 32bit syscall_table (as we used to do in our kmod).
As a side effect, we now support filtering interesting syscalls for 32bit syscalls too, with no changes (given that they are translated to 64bit syscalls and our current interesting syscalls impl uses 64bit syscalls!)

To test (both normal syscall and socketcall), please use this small C script:

#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <linux/net.h>        /* Definition of SYS_* constants */

int main() {
    syscall(__NR_close, -1);
#ifdef __NR_socketcall
    printf("Trying a socketcall!\n");
    unsigned long args[3] = {0};
    syscall(__NR_socketcall, SYS_SOCKET, args);
#endif
    return 0;
}

Built with gcc ia32.c -o ia32 -m32.
Then run sinsp-example like:

sudo ./libsinsp/examples/sinsp-example -k -f "evt.type in (close,socket) and proc.name contains ia32" -o "%evt.args"

Example output for
kmod:

-- Filter AST (2) ppm sc names: close, socket
-- Try to open: 'kmod' engine.
-- Engine 'kmod' correctly opened.
-- Start capture
{"evt.args":"fd=3(<f>/etc/ld.so.cache) "}
{"evt.args":"res=0 "}
{"evt.args":"fd=3(<f>/usr/lib32/libc.so.6) "}
{"evt.args":"res=0 "}
{"evt.args":"fd=-1(EPERM) "}
{"evt.args":"res=-9(EBADF) "}
{"evt.args":"domain=0(AF_UNSPEC) type=0 proto=0 "}
{"evt.args":"fd=-97(EAFNOSUPPORT) "}

modern bpf:

-- Filter AST (2) ppm sc names: close, socket
-- Try to open: 'modern_bpf' engine.
-- Engine 'modern_bpf' correctly opened.
-- Start capture
{"evt.args":"fd=3(<f>/etc/ld.so.cache) "}
{"evt.args":"res=0 "}
{"evt.args":"fd=3(<f>/usr/lib32/libc.so.6) "}
{"evt.args":"res=0 "}
{"evt.args":"fd=-1(EPERM) "}
{"evt.args":"res=-9(EBADF) "}

bpf:

-- Filter AST (2) ppm sc names: close, socket
-- Try to open: 'bpf' engine.
driver/bpf/probe.o
-- Engine 'bpf' correctly opened.
-- Start capture
{"evt.args":"fd=3(<f>/etc/ld.so.cache) "}
{"evt.args":"res=0 "}
{"evt.args":"fd=3(<f>/usr/lib32/libc.so.6) "}
{"evt.args":"res=0 "}
{"evt.args":"fd=-1(EPERM) "}
{"evt.args":"res=-9(EBADF) "}

As you can see, only kmod receives the ia32 socketcall. As i said previously, socketcall is very hard because it is not present on x86_64, but it is present on x86_32; therefore:

  • we are not able to map the 32bit to 64bit through our table and we need to manage it manually (see kmod main.c enter and exit probes as an example)
  • Currently CAPTURE_SOCKETCALL (feature_gates.h) is only enabled for s390x, and it is used by bpf and modern bpf. Enabling it for x86_64 too breaks the build, because x86_64 has no __NR_socketcall syscall

(SOCKETCALL SUPPORT WAS INDEED IMPLEMENTED IN ALL 3 DRIVERS for x86. See below comments:

Does this PR introduce a user-facing change?:

new(driver): implement support for x86 ia32 emulation for bpf drivers

@github-actions
Copy link

github-actions bot commented Jul 12, 2023

Please double check driver/SCHEMA_VERSION file. See versioning.

/hold

@@ -1339,50 +1339,4 @@ static __always_inline int bpf_val_to_ring_type(struct filler_data *data,
return __bpf_val_to_ring(data, val, 0, type, -1, false, param_type_to_mem(type));
}

static __always_inline bool bpf_in_ia32_syscall()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moved to plumbing_helpers

@@ -7,6 +7,10 @@

#include <helpers/interfaces/syscalls_dispatcher.h>
#include <helpers/interfaces/attached_programs.h>
#include <bpf/bpf_helpers.h>

#define X86_64_NR_EXECVE 59
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the __NR_* macros here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming building for 64bit, then I think you can use them.


#include "ppm_events_public.h"

const int g_ia32_64_map[SYSCALL_TABLE_SIZE] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file contains the mapping between a 32bit syscall number (array index) and its 64bit value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is autogenerated by syscalls-bumper; i can add a comment explaining what the map does ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as discussed privately with @Andreagit97 , in the future we might push a generic syscall32 event when the 32bit syscalls have no 64bit mapping (the -1 values in the array).
Or we could push a syscall generic event adding a new bool is32bit parameter.
I'll add this to the PR body future work.

@FedeDP
Copy link
Contributor Author

FedeDP commented Jul 12, 2023

Fixed old bpf probe socketcall support:

-- Filter AST (3) ppm sc names: close, open, socket
-- Try to open: 'bpf' engine.
driver/bpf/probe.o
-- Engine 'bpf' correctly opened.
-- Start capture
{"evt.args":"fd=3(<f>/etc/ld.so.cache) ","evt.type":"close"}
{"evt.args":"res=0 ","evt.type":"close"}
{"evt.args":"fd=3(<f>/usr/lib32/libc.so.6) ","evt.type":"close"}
{"evt.args":"res=0 ","evt.type":"close"}
{"evt.args":"fd=-1(EPERM) ","evt.type":"close"}
{"evt.args":"res=-9(EBADF) ","evt.type":"close"}
{"evt.args":"name=<NA> flags=1(O_RDONLY) mode=0664 ","evt.type":"open"}
{"evt.args":"fd=3(<f>/tmp/foo) name=/tmp/foo flags=1(O_RDONLY) mode=0 dev=21 ino=11328 ","evt.type":"open"}
{"evt.args":"domain=0(AF_UNSPEC) type=0 proto=0 ","evt.type":"socket"}
{"evt.args":"fd=-97(EAFNOSUPPORT) ","evt.type":"socket"}

(I added an open syscall too here).
It is not the best looking code possible, but it works.
We might want to revisit it later though.

@@ -1 +1 @@
4.0.2
5.0.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added a new map for bpf drivers. This is a maj break.

driver/bpf/plumbing_helpers.h Outdated Show resolved Hide resolved
args_pointer = bpf_syscall_get_argument_from_ctx(ctx, 1);
bpf_probe_read_user(&arg, sizeof(unsigned long), (void*)(args_pointer + (idx * sizeof(unsigned long))));
bpf_probe_read_user(&arg, size, (void*)(args_pointer + (idx * size)));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjust read size if we are on a 32bit syscall.

#ifdef CAPTURE_SOCKETCALL
if(bpf_syscall_get_nr(data->ctx) == __NR_socketcall)
#if defined(CAPTURE_SOCKETCALL) || (defined(CONFIG_X86_64) && defined(CONFIG_IA32_EMULATION))
if(bpf_syscall_get_nr(data->ctx) == data->state->tail_ctx.socketcall_syscall_id)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be __NR_socketcall or __NR_ia32_socketcall.

@@ -634,6 +725,8 @@ static __always_inline void call_filler(void *ctx,

++state->n_evts;

state->tail_ctx.socketcall_syscall_id = socketcall_syscall_id;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Store this in our tail_ctx so that it is made available to bpf_syscall_get_argument.

driver/bpf/plumbing_helpers.h Outdated Show resolved Hide resolved
@FedeDP
Copy link
Contributor Author

FedeDP commented Jul 12, 2023

Implemented x86 ia32 support for socketcall in modern bpf too:

{"evt.args":"fd=3(<f>/etc/ld.so.cache) ","evt.type":"close"}
{"evt.args":"res=0 ","evt.type":"close"}
{"evt.args":"fd=3(<f>/usr/lib32/libc.so.6) ","evt.type":"close"}
{"evt.args":"res=0 ","evt.type":"close"}
{"evt.args":"fd=-1(EPERM) ","evt.type":"close"}
{"evt.args":"res=-9(EBADF) ","evt.type":"close"}
{"evt.args":"name=<NA> flags=1(O_RDONLY) mode=0 ","evt.type":"open"}
{"evt.args":"fd=3(<f>/tmp/foo) name=/tmp/foo flags=1(O_RDONLY) mode=0 dev=21 ino=11328 ","evt.type":"open"}
{"evt.args":"domain=0(AF_UNSPEC) type=0 proto=0 ","evt.type":"socket"}
{"evt.args":"fd=-97(EAFNOSUPPORT) ","evt.type":"socket"}

@FedeDP
Copy link
Contributor Author

FedeDP commented Jul 12, 2023

In file included from /home/runner/work/libs/libs/driver/modern_bpf/helpers/extract/extract_from_kernel.h:14:
/usr/include/syscall.h:1:10: fatal error: 'sys/syscall.h' file not found
#include <sys/syscall.h>
         ^~~~~~~~~~~~~~~
1 error generated.

Ok. 😆

@FedeDP FedeDP force-pushed the new/ia32_bpf_support branch from 14a5b12 to 5daacdc Compare July 13, 2023 06:23
@FedeDP FedeDP force-pushed the new/ia32_bpf_support branch from 847b05b to 31a142b Compare October 6, 2023 14:28
@incertum
Copy link
Contributor

incertum commented Oct 6, 2023

[1]

re

R3 pointer arithmetic on CONST_PTR_TO_MAP prohibited

-- END PROG LOAD LOG --
libscap: bpf_load_program() event=tracepoint/raw_syscalls/sys_enter: Operation not permitted (1)

hmmm yes a bit mysterious https://docs.kernel.org/bpf/verifier.html#register-value-tracking
can we check on convert_ia32_to_64?

[2]

re below, do we still somehow index an array or so without checking bounds first?

354: (bf) r8 = r2
355: (85) call bpf_probe_read_user#112
R2 unbounded memory access, use 'var &= const' or 'if (var < const)'
processed 7819 insns (limit 1000000) max_states_per_insn 6 total_states 77 peak_states 60 mark_read 28

-- END PROG LOAD LOG --
libscap: bpf_load_program() event=raw_tracepoint/filler/proc_startupdate: Operation not permitted (1)

[STATUS] FAILED /libs/test/vm/build/driver/clang-12/5.10.9-1.el7.elrepo.x86_64

[3]

re

open_by_handle_at_x_extra_tail_1 having to many instructions

Can we split?

@FedeDP
Copy link
Contributor Author

FedeDP commented Oct 10, 2023

can we check on convert_ia32_to_64?

Yep, we have already been doing that; no luck for now :(

[STATUS] FAILED /libs/test/vm/build/driver/clang-12/5.10.9-1.el7.elrepo.x86_64

No idea :(

Can we split?

Yep, i'd do that in a follow up PR though!
Also, for 2,3 can you open a tracking issue?

@Andreagit97 is going to push the fix for the modern bpf verifier issue soon! We are just left with the old bpf verifier issue; IMHO we could even track it in an issue and go on, but i am willing to invest a little bit of time still to try to fix it.

Andreagit97 and others added 4 commits October 10, 2023 12:02
Moving `extract__network_args` before the ringbuf initialization keeps
the `ringbuf` variable into BPF registers.

Signed-off-by: Andrea Terzolo <[email protected]>
…ts are not available.

This limitation was already set in stone on master.

Signed-off-by: Federico Di Pierro <[email protected]>
@FedeDP
Copy link
Contributor Author

FedeDP commented Oct 11, 2023

Me and @Andreagit97 fixed old bpf verifier and modern bpf verifier issues!! 🚀

Final matrix (same as on master):

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-4.19 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2-5.10 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2023-6.1 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.0 🟢 🟢 🟢 🟢 🟢 🟢
centos-3.10 🟢 🟢 🟢 🟡 🟡 🟡
centos-4.18 🟢 🟢 🟢 🟢
centos-5.14 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.17 🟢 🟢 🟢 🟢
fedora-5.8 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-3.10 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-4.14 🟢 🟢 🟢 🟢 🟢 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-5.4 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-4.15 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-6.3 🟢 🟢 🟢 🟢 🟢 🟢

Since i mentioned before, here is the list of syscalls that are ia32 only (ie; are unmapped on x86_64 and we will discard them):

212 -> chown32
404 -> clock_settime64
89 -> readdir
117 -> ipc
269 -> fstatfs64 
414 -> ppoll_time64
204 -> setregid32
134 -> bdflush
84 -> oldstat
34 -> nice
405 -> clock_adjtime64
421 -> rt_sigtimedwait_time64
413 -> pselect6_time64
109 -> olduname
239 -> sendfile64
195 -> stat64
205 -> getgroups32
193 -> truncate64
419 -> mq_timedreceive_time64
203 -> setreuid32
420 -> semtimedop_time64
119 -> sigreturn
272 -> fadvise64_64
7 -> waitpid
166 -> vm86
68 -> sgetmask
216 -> setfsgid32
22 -> umount
196 -> lstat64
417 -> recvmmsg_time64
18 -> oldstat
410 -> timerfd_gettime64
200 -> getgid32
221 -> fcntl64
142 -> _newselect
411 -> timerfd_settime64
207 -> fchown32
422 -> futex_time64
213 -> setuid32
208 -> setresuid32
202 -> getegid32
197 -> fstat64 
215 -> setfsuid32
403 -> clock_gettime64
423 -> sched_rr_get_interval_time64
206 -> setgroups32
73 -> sigpending
67 -> sigaction
300 -> fstatat64
211 -> getresgid32
209 -> getresuid32
412 -> utimensat_time64
210 -> setresgid32
268 -> statfs64
48 -> signal
126 -> sigprocmask
102 -> socketcall
201 -> geteuid32
407 -> clock_nanosleep_time64
113 -> vm86old
408 -> timer_gettime64
418 -> mq_timedsend_time64
416 -> io_pgetevents_time64
192 -> mmap2 -> NOTE: manually mapped to x86_64 mmap!
72 -> sigsuspend
214 -> setgid32
409 -> timer_settime64
69 -> ssetmask
194 -> ftruncate64
25 -> stime

I updated syscalls-bumper PR to properly add the syscall name for unmapped syscalls in the new table; moreover, since mmap2 is completely compatible with mmap (we indeed use the same filler in the fillers table), but is only available on ia32, syscalls-bumper will resolve mmap2 (ia32) to mmap (x86_64).
This is a first step, perhaps there are more syscalls that might take advantage of such a compatibility layer. Syscalls-bumper code does easily allow adding more translations if needed.

Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! If the kernel testing is green I think we can merge it!

LGTM, I have some commit here so I would avoid approving!

@FedeDP
Copy link
Contributor Author

FedeDP commented Oct 11, 2023

Amazing work! If the kernel testing is green I think we can merge it!

Kernel testing is super 🟢 :D

@Andreagit97
Copy link
Member

The SCHEMA_VERSION failure is a false positive here

@leogr
Copy link
Member

leogr commented Oct 11, 2023

The SCHEMA_VERSION failure is a false positive here

👍

/hold cancel

@poiana
Copy link
Contributor

poiana commented Oct 11, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FedeDP, hbrueckner, jasondellaluce, leogr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [FedeDP,jasondellaluce,leogr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

eBPF probes: 32 bit applications support
7 participants