Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update(modern_bpf): reduce the execve instrumentation time with new APIs #648

Merged
merged 13 commits into from
Oct 25, 2022

Conversation

Andreagit97
Copy link
Member

@Andreagit97 Andreagit97 commented Oct 2, 2022

What type of PR is this?

/kind cleanup

/kind feature

Any specific area of the project related to this PR?

/area driver-modern-bpf

Does this PR require a change in the driver versions?

No

What this PR does / why we need it:

Playing with https://github.com/Andreagit97/BPF-perftool we noticed that the execveat instrumentation time was strangely high compared to the old probe one:

old probe     new probe

12140 ns      16347 ns

This overhead seems related to the use of extract__charbuf_pointer_from_array inline function, for some reason clang increases the complexity of the programs causing overhead in the prog execution. With the new APIs auxmap__store_execve_exe and auxmap__store_execve_args I've recorded these new data:

old probe     new probe

12140 ns      10518 ns

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

FedeDP
FedeDP previously approved these changes Oct 3, 2022
Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!
/approve

@poiana
Copy link
Contributor

poiana commented Oct 3, 2022

LGTM label has been added.

Git tree hash: 6210ba75c3a3c29c0dbbf33c60a4e419f49a50ee

@hbrueckner
Copy link
Contributor

@Andreagit97 , very nice tooling for measurements. I will have a look at it. I left a comment on #633 regarding the args/env max discussion. So far, you change here looks nice.

@Andreagit97 Andreagit97 force-pushed the execve_instrumentation_time branch from 3bcbd53 to c9388df Compare October 15, 2022 13:58
@poiana poiana removed the lgtm label Oct 15, 2022
@poiana poiana requested a review from FedeDP October 15, 2022 13:58
@poiana poiana added size/XL and removed size/L labels Oct 15, 2022
@Andreagit97
Copy link
Member Author

Andreagit97 commented Oct 15, 2022

Thanks to @incertum fixes I've added also 2 tests for execve and execveat success cases to avoid future issues like the one reported here #633

Copy link
Contributor

@incertum incertum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Andreagit97 LGTM

Copy link
Contributor

@hbrueckner hbrueckner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Andreagit97 , I have spot some places for reviewing the right constant. I will give this PR a try later this week. Thanks.

@Andreagit97
Copy link
Member Author

hey @hbrueckner the problem is even worst, the kmod is not able to catch the clone child event while the modern bpf probe yes, but the tracepoint is exactly the same 🤔

@hbrueckner
Copy link
Contributor

Hi @Andreagit97

hey @hbrueckner the problem is even worst, the kmod is not able to catch the clone child event while the modern bpf probe yes, but the tracepoint is exactly the same thinking

This indeed looks very strange. Also had a look add kernel driver and kernel sources, and agree they basically use the same tracepoint_probe_register() functions. Regarding the modern bpf and related EFAULT, one option would be to try a sleepable BPF program for clone family and use bpf_copy_from_user() which might sleep (not sure if that will work or perform at all). However, I did not catch the right place to add the BPF_F_SLEEPABLE program attribute.

@hbrueckner
Copy link
Contributor

Hi @Andreagit97,

thanks for the update and, yeah, also agree on sticking with sched_exec_fork for now until we have more clarity on the strange behavior. Perhaps, worth to open a separate issue to track it? wdyt?

@Andreagit97
Copy link
Member Author

Andreagit97 commented Oct 25, 2022

Hi @Andreagit97

hey @hbrueckner the problem is even worst, the kmod is not able to catch the clone child event while the modern bpf probe yes, but the tracepoint is exactly the same thinking

This indeed looks very strange. Also had a look add kernel driver and kernel sources, and agree they basically use the same tracepoint_probe_register() functions. Regarding the modern bpf and related EFAULT, one option would be to try a sleepable BPF program for clone family and use bpf_copy_from_user() which might sleep (not sure if that will work or perform at all). However, I did not catch the right place to add the BPF_F_SLEEPABLE program attribute.

I've just pushed a possible solution. In ARM we already need to catch the clone/clone3 child events from another tracepoint (sched_process_fork) since there is an issue in the sys_exit_tracepoint, in the past, I've seen this issue also in old s390x machines for this reason we have this workaround also in our kmod that supports s390x. So I propose to use also for s390x this different tracepoint since we already need it for arm64. Now all the tests should pass and we should be able to collect all fork/clone/clone3/vfork events without EFAULT problems since the child event is generated by the parent process. Let me know if you have some doubts or other suggestions @FedeDP @hbrueckner

Sorry for the PR dimension this is quite huge but there are a lot of tests inside... I still need to address this comment #648 (comment)

Hi @Andreagit97,

thanks for the update and, yeah, also agree on sticking with sched_exec_fork for now until we have more clarity on the strange behavior. Perhaps, worth to open a separate issue to track it? wdyt?

Agree on opening the issue, the best solution would be to find another way to take the args from the kernel but i would address it in another PR probably as you suggested :)

@FedeDP
Copy link
Contributor

FedeDP commented Oct 25, 2022

've just pushed a possible solution. In ARM we already need to catch the clone/clone3 child events from another tracepoint (sched_process_fork) since there is an issue in the sys_exit_tracepoint, in the past, I've seen this issue also in old s390x machines for this reason we have this workaround also in our kmod that supports s390x. So I propose to use also for s390x this different tracepoint since we already need it for arm64. Now all the tests should pass and we should be able to collect all fork/clone/clone3/vfork events without EFAULT problems since the child event is generated by the parent process. Let me know if you have some doubts or other suggestions @FedeDP @hbrueckner

Fully agree with the solution; simple and effective!

@Andreagit97
Copy link
Member Author

the good news is that this PR should be the last one to have a full working implementation of the modern bpf probe 🥳

@hbrueckner
Copy link
Contributor

've just pushed a possible solution. In ARM we already need to catch the clone/clone3 child events from another tracepoint (sched_process_fork) since there is an issue in the sys_exit_tracepoint, in the past, I've seen this issue also in old s390x machines for this reason we have this workaround also in our kmod that supports s390x. So I propose to use also for s390x this different tracepoint since we already need it for arm64. Now all the tests should pass and we should be able to collect all fork/clone/clone3/vfork events without EFAULT problems since the child event is generated by the parent process. Let me know if you have some doubts or other suggestions @FedeDP @hbrueckner

Fully agree with the solution; simple and effective!

Agreed too! Thanks a lot @Andreagit97 !

@hbrueckner
Copy link
Contributor

Hi @Andreagit97

Sorry for the PR dimension this is quite huge but there are a lot of tests inside... I still need to address this comment #648 (comment)

No problem ... I will need some time for a deeper review on the recent changes ... and, awesome that we now have a full working modern bpf probe!

@@ -39,6 +39,9 @@
/* Proc name */
#define MAX_PROC_EXE 4096

/* Task comm */
#define MAX_TASK_COMM 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Andreagit97 ,
for reference, here is the link to the kernel source as background: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/sched.h#n286

Reading it a bit more carefully, the len has been redefined as enum to make it available to BPF programs with commit:

commit 3087c61ed2c48548b74dd343a5209b87082c682d
Author: Yafang Shao <[email protected]>
Date:   Wed Jan 19 18:08:40 2022 -0800

    tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN
    
    As the sched:sched_switch tracepoint args are derived from the kernel,
    we'd better make it same with the kernel.  So the macro TASK_COMM_LEN is
    converted to type enum, then all the BPF programs can get it through
    BTF.
    
    The BPF program which wants to use TASK_COMM_LEN should include the
    header vmlinux.h.  Regarding the test_stacktrace_map and
    test_tracepoint, as the type defined in linux/bpf.h are also defined in
    vmlinux.h, so we don't need to include linux/bpf.h again.

For s390x, it is already included in the generated driver/modern_bpf/definitions/vmlinux.h definitions. So question here is whether we keep it that way or update the definitions and use directly the enum? Wdyt?

P.S. the remaining changes looks good to me!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think we can add it manually to x86 and aarch64 without updating all the vmlinux files and use this already defined enum, btw great catch this saves us an additional macro :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

@Andreagit97 Andreagit97 force-pushed the execve_instrumentation_time branch from 542524d to ecd21b0 Compare October 25, 2022 14:37
@Andreagit97
Copy link
Member Author

Tested on all three architectures (x86, aarch64, s390x) and it should work!

@hbrueckner
Copy link
Contributor

Hi @Andreagit97 ,

Tested on all three architectures (x86, aarch64, s390x) and it should work!

Confirmed for s390x:

[----------] Global test environment tear-down
[==========] 205 tests from 3 test suites ran. (131 ms total)
[  PASSED  ] 205 tests.

Further, I went through your commits and they are fine!

@Andreagit97 Incredible and awesome work as usual!

/lgtm

@poiana
Copy link
Contributor

poiana commented Oct 25, 2022

LGTM label has been added.

Git tree hash: 37fc8e72aaa5b99bbcef12263541fdaf6a073d6b

Copy link
Member

@leogr leogr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Oct 25, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, FedeDP, hbrueckner, leogr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Andreagit97,FedeDP,leogr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit fbc44ec into master Oct 25, 2022
@poiana poiana deleted the execve_instrumentation_time branch October 25, 2022 15:42
@Andreagit97
Copy link
Member Author

This PR completes #513

@FedeDP
Copy link
Contributor

FedeDP commented Dec 2, 2022

/milestone 4.0.0+driver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants