Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking] Params inconsistencies in our drivers #515

Open
18 of 59 tasks
Andreagit97 opened this issue Jul 30, 2022 · 27 comments
Open
18 of 59 tasks

[Tracking] Params inconsistencies in our drivers #515

Andreagit97 opened this issue Jul 30, 2022 · 27 comments
Labels
kind/documentation Improvements or additions to documentation
Milestone

Comments

@Andreagit97
Copy link
Member

Andreagit97 commented Jul 30, 2022

PLEASE NOTE

This issue is mainly for tracking purposes, some points cannot be addressed until we solve the scap-file compatibility issue -> #1381 (comment)

Generic context

The aim of this issue is to track all the inconsistencies when we send event params from our drivers (modern_bpf, bpf, kernel module) to userspace. Some widespread issues that need a dedicated conversion

  • Today when we send file descriptors fd to userspace, we send them as int64_t while they are represented on int32_t. This leads us to waste a lot of space in our ring buffers... fd params are very common in our event, we waste 4 bytes every time we send a param of this type. Considering a small/medium-size system, we can imagine that it could send also 1 million of fd params per second, this would mean wasting almost 4 MB of space in our ring buffers per second!
  • Today when we send process identifiers pid to userspace, we send them as int64_t while they are represented on int32_t. This leads us to waste a lot of space in our ring buffers as explained in the previous fd case.
  • In some syscalls we take the syscalls flags as an int value and we push it to userspace as uint32_t without converting it with our internal PPM representation.
  • This is similar to the previous point, but even if we send flags/modes with the same type (so take uint32_t and send uint32_t) in some cases we don't convert these values into the scap PPM format, so we cannot use this flags/modes userspace-side even if we catch them driver side.
  • In some events we send empty params because they are still not implemented.
  • Every syscall must have its PPM_CODE and its event pair.
  • Different drivers manage max boundaries in different ways we need to uniform them in some way 👇
    update(modern_bpf): reduce the execve instrumentation time with new APIs #648 (comment)

Syscall-specific issues

LEGENDA

  • [NOT ADDRESSABLE] -> means that the issue is not addressable at the moment, at least until we don't solve the scap file issue, see PLEASE NOTE ☝️
  • [MODERN_BPF] -> means that the issue is only related to the modern_bpf probe
  • [BPF] -> means that the issue is only related to the bpf probe
  • [KMOD] -> means that the issue is only related to the kernel module
  • ⚠️ -> possible problems, fix it!
  • ⬅️ -> only in the enter event
  • ➡️ -> only in the exit event

open_by_handle_at ➡️

  • [MODERN BPF] we can get at maximum 8 path_components we need to find a workaround to manage more components!
  • open_flags_to_scap() method should receive an int value and not a uint32_t.

dup3 ➡️

  • dup3_flags_to_scap() method should receive an int value and not a uint32_t.

open

  • open_flags_to_scap() method should receive an int value and not a uint32_t.

openat

  • open_flags_to_scap() method should receive an int value and not a uint32_t.

openat2

  • open_flags_to_scap() method should receive an int value and not a uint32_t.

eventfd ⬅️

eventfd2 ⬅️

  • param 2 (flags) is not implemented, we push 0 to userspace

inotify_init ⬅️

  • [NOT ADDRESSABLE] inotify_init has no syscall arguments but we send one param

signalfd ⬅️

  • [NOT ADDRESSABLE] param 2 (mask) is not implemented, we push 0 to userspace. We should remove it
  • [NOT ADDRESSABLE] param 3 (flags)is not implemented, we push 0 to userspace. We should remove it. Moreover, this syscall has not a flag argument, please see here for more details https://elixir.bootlin.com/linux/v6.5.5/source/fs/signalfd.c#L314

signalfd4 ⬅️

  • [NOT ADDRESSABLE] param 2 (mask) is not implemented, we push 0 to userspace. We should remove it.

timerfd_create ⬅️

  • param 1 (clockid) is not implemented, we push 0 to userspace. We should implement it.
  • param 2 (flags) is not implemented, we push 0 to userspace. We should implement it.

userfault_fd ➡️

  • param 2 (flags) miss an helper like userfaultfd_flags_to_scap to convert flags to scap notation.

ptrace ➡️

  • [NOT ADDRESSABLE] param 2 (addr) not sure we really need a PT_DYN param, we always send the same len.
  • [NOT ADDRESSABLE] param 3 (data) not sure about the utlity of sending the data_pointer to userspace.

mkdirat ➡️

  • param 4 (mode) we need to convert the mode to the scap format.

pipe2

  • we need a new event for pipe2 otherwise we cannot catch the flags. Right now we use the same event of pipe.

renameat2 ➡️

  • param 6 (flags) we need to convert the flags to the scap format with an helper like renameat2_flags_to_scap.

execve ➡️

execveat ➡️

fork ➡️

  • [NOT ADDRESSABLE] param 7 (cwd) is not implemented, we push 0 to userspace
    res = bpf_val_to_ring_type(data, (unsigned long)&empty, PT_CHARBUF);

clone ➡️

  • [NOT ADDRESSABLE] param 7 (cwd) is not implemented, we push 0 to userspace
    res = bpf_val_to_ring_type(data, (unsigned long)&empty, PT_CHARBUF);

clone3 ➡️

  • [NOT ADDRESSABLE] param 7 (cwd) is not implemented, we push 0 to userspace
    res = bpf_val_to_ring_type(data, (unsigned long)&empty, PT_CHARBUF);

vfork ➡️

  • [NOT ADDRESSABLE] param 7 (cwd) is not implemented, we push 0 to userspace
    res = bpf_val_to_ring_type(data, (unsigned long)&empty, PT_CHARBUF);

socket ⬅️

  • [NOT ADDRESSABLE] param 1 (domain) the socket_family_to_scap method should receive an int, not a u8, and we need to choose if the param should be on 8 bits or 32 bits. We need also to update the socket_family_to_scap with new socket families.

connect ➡️

  • [NOT ADDRESSABLE] param 2 (tuple) in case of UNIX sockets, not sure about the utility of sending kernel pointers to userspace

socketpair ⬅️

Same issues of socket syscall

  • [NOT ADDRESSABLE] param 1 (domain) the socket_family_to_scap method should receive an int, not a u8, and we need to choose if the param should be on 8 bits or 32 bits. We need also to update the socket_family_to_scap with new socket families.

socketpair ➡️

  • [NOT ADDRESSABLE] param 4 (source) not sure about the utility of sending kernel pointers to userspace
  • [NOT ADDRESSABLE] param 5 (peer) not sure about the utility of sending kernel pointers to userspace

accept ➡️

accept4 ⬅️

  • [NOT ADDRESSABLE] param 1 (flags) still to implement, today we send always 0. This bug is used in the socketcall wokraround

listen ⬅️

bpf ⬅️

  • [NOT ADDRESSABLE] param 1 (cmd) is an int not a int64_t

flock ⬅️

  • param 2 (operation) we need to read it as an int and then convert it to uint32_t, while today we read it as an unsigned long

quotactl ⬅️

  • param 1 (cmd) we need to read it as an int and then convert it to uint32_t, while today we read it as an unsigned long
  • param 3 (id) is an int not a int32_t

quotactl ➡️

  • param 13 (dqi_flags) add conversion to scap format

unshare ⬅️

  • param 1 (flags) we need to read it as an int and then convert it to uint32_t, while today we read it as an unsigned long

mount ⬅️

  • param 1 (flags) if we want to use this info in userspace we need to convert it into scap format.

umount2 ⬅️

linkat ➡️

  • param 6 (flags) we need to read it as an int and then convert it to uint32_t, while today we read it as an unsigned long

unlinkat ➡️

  • param 4 (flags) we need to read it as an int and then convert it to uint32_t, while today we read it as an unsigned long

setns ⬅️

  • param 2 (nstype) we need to read it as an int and then convert it to uint32_t, while today we read it as an unsigned long

setrlimit ⬅️

  • param 1 (resource) we need to read it as an int and then convert it to uint8_t, while today we read it as an unsigned long

prlimit64 ⬅️

  • param 2 (resource) we need to read it as an int and then convert it to uint8_t, while today we read it as an unsigned long

sendto ⬅️

  • [NOT ADDRESSABLE] param 3 (tuple) should be catched in the exit event when we know the outcome of the syscall otherwise there is the risk to catch something wrong.

sendmsg ⬅️

  • [NOT ADDRESSABLE] param 3 (tuple) should be catched in the exit event when we know the outcome of the syscall otherwise there is the risk to catch something wrong.

ppoll⬅️

  • [NOT ADDRESSABLE] param 3 (sigmask) we send only the first 32 bits

ppoll⬅️

  • [NOT ADDRESSABLE] param 3 (sigmask) we send only the first 32 bits

ppoll⬅️

  • [NOT ADDRESSABLE] param 3 (sigmask) we send only the first 32 bits

recvmmsg:

[NOT ADDRESSABLE] Empty instrumentation

sendmmsg:

[NOT ADDRESSABLE] Empty instrumentation

@Andreagit97 Andreagit97 added the kind/documentation Improvements or additions to documentation label Jul 30, 2022
@FedeDP
Copy link
Contributor

FedeDP commented Aug 1, 2022

Hi Andrea, like always, good catch! :D

I agree with you; i'd phrase it in a different way actually:
"if we today are able to push to userspace 1mln fd-only events, we could instead push up to 2mln fd-only events"
This hits even harder :)

To retain backward compatibility, i think we could:

  • add a new PT_FD32 and PT_PID32
  • new drivers will send PT_PID32 and PT_FD32
  • we are still compatible with old 64-bit types, while reading from scap files
  • of course, bump maj schema version :/

This fixes the first 2 points; the others are different cases instead.

@Andreagit97
Copy link
Member Author

Hi Andrea, like always, good catch! :D

I agree with you; i'd phrase it in a different way actually: "if we today are able to push to userspace 1mln fd-only events, we could instead push up to 2mln fd-only events" This hits even harder :)

🤯 🤯

To retain backward compatibility, i think we could:

* add a new `PT_FD32` and `PT_PID32`

* new drivers will send PT_PID32 and PT_FD32

* we are still compatible with old 64-bit types, while reading from scap files

* of course, bump maj schema version :/

This fixes the first 2 points; the last is a different case instead.

Completely agree with this solution!

@incertum
Copy link
Contributor

incertum commented Aug 1, 2022

Love it, big +1, great catch!

@FedeDP
Copy link
Contributor

FedeDP commented Aug 2, 2022

First 2 points will be addressed by #526

@Andreagit97
Copy link
Member Author

I have slightly changed the issue format with Generic event issues and Specific event issues in this way it should be more maintainable, thank you to @hbrueckner @Molter73 @FedeDP for all the help in finding new issues

@jasondellaluce
Copy link
Contributor

Tracking down all of this is of incredible value. Thank you a lot!

@hbrueckner
Copy link
Contributor

I have slightly changed the issue format with Generic event issues and Specific event issues in this way it should be more maintainable, thank you to @hbrueckner @Molter73 @FedeDP for all the help in finding new issues

You are welcome! Many thanks @Andreagit97 for the excellent summary and tracker of all the review follow-ups!

@gnosek
Copy link
Contributor

gnosek commented Dec 2, 2022

"if we today are able to push to userspace 1mln fd-only events, we could instead push up to 2mln fd-only events"
This hits even harder :)

Optimism is awesome but let me cool it down a little bit :P Every event has this header, which is somewhat larger than the zero bytes it would need for that claim to be true ;)

struct ppm_evt_hdr {
#ifdef PPM_ENABLE_SENTINEL
    uint32_t sentinel_begin;
#endif
    uint64_t ts; /* timestamp, in nanoseconds from epoch */
    uint64_t tid; /* the tid of the thread that generated this event */
    uint32_t len; /* the event len, including the header */
    uint16_t type; /* the event type */
    uint32_t nparams; /* the number of parameters of the event */
};

BTW, we could probably easily change nparams to 16 bits. If we do expect >64k parameters, we can hack something for the (hopefully rare) events that exceed this number.

We could also trim the tid to 32 bits, but then we use this struct all over userspace too (#sadpanda), and other environments may want large tids (e.g. gvisor), so we would have to decouple these two structs and copy data field by field between them.

Don't let me distract you from tracking down the inconsistencies though, that's an awesome job!

These two changes would cut down 6 bytes from every event, equivalent to one and a half fds with no schema changes, just a major api version bump.

@FedeDP
Copy link
Contributor

FedeDP commented Dec 2, 2022

You are right, of course! Nonetheless, we are wasting lots of bytes that, all together, would:

  • speed up pushing events (because we need to write, and later read, a little bit less on the ring buffer)
  • clear up space for new events

I was a bit over-reacting though, agree ahah

BTW, we could probably easily change nparams to 16 bits. If we do expect >64k parameters, we can hack something for the (hopefully rare) events that exceed this number.
We could also trim the tid to 32 bits, but then we use this struct all over userspace too (#sadpanda), and other environments may want large tids (e.g. gvisor), so we would have to decouple these two structs and copy data field by field between them.

Yep!

Don't let me distract you from tracking down the inconsistencies though, that's an awesome job!

You gave thorough ideas! And data :D

I think @Andreagit97 is working on porting current modern-bpf tests to work with kmod and old bpf; this should help us spot many more inconsistencies!

@poiana
Copy link
Contributor

poiana commented May 2, 2023

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@poiana
Copy link
Contributor

poiana commented Jun 1, 2023

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

@Andreagit97
Copy link
Member Author

/remove-lifecycle rotten

@Andreagit97 Andreagit97 modified the milestone: 0.12.0 Jun 7, 2023
oheifetz added a commit to oheifetz/libs that referenced this issue Aug 18, 2023
- change the flags (param 1) from u32 to s32
- add a userspace to scap flag conversion helper routine

Reported by: github issue falcosecurity#515

Signed-off-by: Ofer Heifetz <[email protected]>
poiana pushed a commit that referenced this issue Aug 18, 2023
bump schema version minor
Reported by: github issue #515

Signed-off-by: Ofer Heifetz <[email protected]>
poiana pushed a commit that referenced this issue Aug 18, 2023
- change the flags (param 1) from u32 to s32
- add a userspace to scap flag conversion helper routine

Reported by: github issue #515

Signed-off-by: Ofer Heifetz <[email protected]>
@Andreagit97 Andreagit97 modified the milestones: driver-backlog, TBD Sep 4, 2023
@Andreagit97
Copy link
Member Author

I've added the [NOT ADDRESSABLE] marker to all issues that we cannot address now due to the scap-file management #1381 (comment)

@LucaGuerra
Copy link
Contributor

Thank you Andrea, I think this makes it much clearer 🙏

@ecbadeaux
Copy link
Contributor

@Andreagit97 Please mark dup3 as completed with strikethrough.

@ecbadeaux
Copy link
Contributor

@incertum @Andreagit97 Can you guys also mark setns, flock, and unshare as complete?

@poiana
Copy link
Contributor

poiana commented Mar 17, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@FedeDP
Copy link
Contributor

FedeDP commented Mar 18, 2024

/remove-lifecycle stale

@poiana
Copy link
Contributor

poiana commented Jun 16, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@Andreagit97
Copy link
Member Author

/remove-lifecycle stale

@poiana
Copy link
Contributor

poiana commented Sep 15, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@FedeDP
Copy link
Contributor

FedeDP commented Sep 16, 2024

/remove-lifecycle stale

@poiana
Copy link
Contributor

poiana commented Dec 15, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@FedeDP
Copy link
Contributor

FedeDP commented Dec 16, 2024

/remove-lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

9 participants