Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(modern): try to address a page fault caused by bpf_probe_read_kernel #1858

Merged
merged 4 commits into from
May 13, 2024

Conversation

Andreagit97
Copy link
Member

What type of PR is this?

/kind bug

Any specific area of the project related to this PR?

/area driver-modern-bpf

Does this PR require a change in the driver versions?

What this PR does / why we need it:

We recently saw some issues with modern_ebpf driver -> falcosecurity/falco#3181.
Analyzing the call trace it seems the issue is related to the accept4_x filler.

[17285553.068504] Call Trace:
[17285553.074274]  <TASK>
[17285553.079218]  ? show_regs.cold.14+0x1a/0x1f
[17285553.084320]  ? __die_body+0x1f/0x70
[17285553.089309]  ? __die+0x2a/0x35
[17285553.094284]  ? _end+0x7b5da0c7/0x0
[17285553.099340]  ? page_fault_oops+0xaf/0x270
[17285553.104379]  ? bpf_probe_read_kernel+0x1d/0x50
[17285553.109575]  ? bpf_ringbuf_submit+0x10/0x20
[17285553.115044]  ? bpf_prog_182d4293644cc965_pf_kernel+0x549/0x558
[17285553.121418]  ? _end+0x7b5da0c7/0x0
[17285553.127468]  ? do_user_addr_fault+0x30b/0x590
[17285553.132943]  ? _end+0x7b5da0c7/0x0
[17285553.138381]  ? exc_page_fault+0x6f/0x160
[17285553.143782]  ? asm_exc_page_fault+0x27/0x30
[17285553.149265]  ? _end+0x7b5da0c7/0x0
[17285553.154742]  ? copy_from_kernel_nofault+0x6d/0x120
[17285553.160220]  bpf_probe_read_kernel+0x1d/0x50
[17285553.166254]  bpf_prog_3a9838b3cf5001f5_accept4_x+0x2e6/0x1589
[17285553.172566]  ? bpf_probe_read_kernel+0x1d/0x50
[17285553.178263]  ? bpf_prog_c5b1b737d5cb01c5_sys_exit+0x28f/0x50c
[17285553.184115]  bpf_trace_run2+0x54/0xd0
[17285553.189977]  __bpf_trace_sys_exit+0x9/0x10
[17285553.195917]  syscall_exit_to_user_mode_prepare+0x171/0x1d0
[17285553.202015]  syscall_exit_to_user_mode+0xd/0x40
[17285553.207926]  do_syscall_64+0x46/0x90
[17285553.214281]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

We can see that the culprit is bpf_probe_read_kernel helper... but we are in ebpf so a page fault should never happen...
Indeed it turns out there is a bug on x86 machines when trying to use copy_from_kernel_nofault() to read vsyscall page
through a bpf program -> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=32019c659ec

And looking at the address that causes the page fault, we can notice that this is a VSYSCALL address

[17285552.983963] BUG: unable to handle page fault for address: ffffffffff6000c7

The question now is, why are we reading from a vsyscall address? We are inside an accept4 this doesn't make so much sense...
My guess here is that due to some race conditions, we can read a random address from extract__file_struct_from_fd and then we try to cast this address to a socket and do some bpf_probe_read_kernel.

All the code added in this PR is an extra safe check that we already have in the legacy ebpf probe. We initially avoided it in the modern probe as a performance optimization because without this "vsyscall address" bug the probe should never cause an unhandled page fault at runtime... Unfortunately, this bug is fixed only in kernel version >= 6.8 so we need a workaround to handle this situation.

This check indeed introduces a little bit of overhead but it's also true that should guarantee more reliable information since we explicitly check that we are a socket before performing some extra bpf_probe_read_kernel, so in any case, these changes can be seen as an improvement in consistency.

Please note that this is just an assumption, the call trace seems to be compatible with this analysis but the page fault could still be there... in that case, we should apply more strict checks on our bpf_probe_read_kernel checking that the address is different from a VSYSCALL address.

Which issue(s) this PR fixes:

ref falcosecurity/falco#3181

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Copy link

github-actions bot commented May 9, 2024

Please double check driver/SCHEMA_VERSION file. See versioning.

/hold

@Andreagit97 Andreagit97 added this to the 0.17.0 milestone May 9, 2024
@@ -16,31 +16,6 @@
#include <bpf/bpf_core_read.h>
#include <bpf/bpf_tracing.h>

/*=============================== LIBBPF MISSING TRACING DEFINITION ===========================*/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we use libbpf 1.3 so this macro shouldn't be necessary

@Andreagit97 Andreagit97 changed the title Fix kernel fault vsyscall fix(modern): try to address a page fault caused by bpf_probe_read_kernel May 9, 2024
Now we use libbpf 1.3.0 and this definition is already included

Signed-off-by: Andrea Terzolo <[email protected]>
@Andreagit97 Andreagit97 force-pushed the fix_kernel_fault_vsyscall branch from 55be08d to 81068b8 Compare May 9, 2024 10:30
@Andreagit97
Copy link
Member Author

just rebased

FedeDP
FedeDP previously approved these changes May 9, 2024
Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented May 9, 2024

LGTM label has been added.

Git tree hash: 92d2a6090d2f315190d633a33e9a9b2a7ee754ec

incertum
incertum previously approved these changes May 9, 2024
Copy link
Contributor

@incertum incertum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@Andreagit97 Andreagit97 dismissed stale reviews from incertum and FedeDP via 2adc6a0 May 10, 2024 07:11
@poiana poiana removed the lgtm label May 10, 2024
@poiana poiana requested review from FedeDP and incertum May 10, 2024 07:11
@Andreagit97 Andreagit97 force-pushed the fix_kernel_fault_vsyscall branch from 2adc6a0 to 5a013f8 Compare May 10, 2024 07:13
@Andreagit97
Copy link
Member Author

ARM64

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-4.14 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

X86

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-4.19 🟢 🟢 🟢 🟢 🟡
amazonlinux2-5.10 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2023-6.1 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.0 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.7 🟢 🟢 🟢 🟢 🟢 🟢
centos-3.10 🟢 🟢 🟢 🟡 🟡 🟡
centos-4.18 🟢 🟢 🟢 🟢 🟢
centos-5.14 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.17 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.8 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-3.10 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-4.14 🟢 🟢 🟢 🟢 🟢 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-5.4 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-4.15 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-5.8 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

There is an issue with the modern

centos-4.18 modern-bpf_scap-open

Msg:

non-zero return code

Err:

unable to calibrate the socket in ebpf. (1)

My guess is that when we read in userspace the value we read a cached value and not the update one, i will try to understand if there is a better way to do that

@Andreagit97
Copy link
Member Author

/hold

@Andreagit97 Andreagit97 force-pushed the fix_kernel_fault_vsyscall branch from 5a013f8 to a5d1950 Compare May 10, 2024 11:02
@poiana poiana added size/XL and removed size/L labels May 10, 2024
@@ -76,6 +81,16 @@ static __always_inline void maps__set_is_dropping(bool value)
is_dropping = value;
}

static __always_inline void* maps__get_socket_file_ops()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now this info is only in kernel space

struct file_operations *f_op = (struct file_operations *)BPF_CORE_READ(f, f_op);
maps__set_socket_file_ops((void*)f_op);
/* we need to rewrite the event header */
ringbuf__rewrite_header_for_calibration(&ringbuf, vpid);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using additional variables we can simply use our event to communicate something to userspace

/* BPF side we send this special event with nparams = 0 */
if(pevent->nparams == 0)
{
/* We don't want to stop here because we want to clean all the buffers. */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we clean all the buffers so we will restart the capture in a clean state

return SCAP_FAILURE;
}

/* Store interesting sc codes */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved after the calibration so we will clear the curr_sc_set

@Andreagit97
Copy link
Member Author

Started again kernel tests: https://github.com/falcosecurity/libs/actions/runs/9031308494

FedeDP
FedeDP previously approved these changes May 10, 2024
Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana poiana added the lgtm label May 10, 2024
@poiana
Copy link
Contributor

poiana commented May 10, 2024

LGTM label has been added.

Git tree hash: dc490e221b332b1632f148fb0702b7b422432078

@FedeDP
Copy link
Contributor

FedeDP commented May 10, 2024

Uh still a failure case on centos:

unable to find the socket event for the calibration in the ringbuffers (1)

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-4.19 🟢 🟢 🟢 🟢 🟡
amazonlinux2-5.10 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2023-6.1 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.0 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.7 🟢 🟢 🟢 🟢 🟢 🟢
centos-3.10 🟢 🟢 🟢 🟡 🟡 🟡
centos-4.18 🟢 🟢 🟢 🟢 🟢
centos-5.14 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.17 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.8 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-3.10 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-4.14 🟢 🟢 🟢 🟢 🟢 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-5.4 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-4.15 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-5.8 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

incertum
incertum previously approved these changes May 10, 2024
Copy link
Contributor

@incertum incertum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@Andreagit97
Copy link
Member Author

Uh still a failure case on centos:

I tried it locally but i cannot reproduce the issue :/

Signed-off-by: Andrea Terzolo <[email protected]>
@Andreagit97 Andreagit97 dismissed stale reviews from incertum and FedeDP via 03e4ec2 May 11, 2024 16:12
@poiana poiana removed the lgtm label May 11, 2024
@poiana poiana requested review from FedeDP and incertum May 11, 2024 16:12
@Andreagit97
Copy link
Member Author

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fix!
/approve
Matrix is now same as master.

@poiana poiana added the lgtm label May 13, 2024
@poiana
Copy link
Contributor

poiana commented May 13, 2024

LGTM label has been added.

Git tree hash: 49ed1c193622e6b57f9f335ffc427a0d4cdc5c9d

@poiana
Copy link
Contributor

poiana commented May 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, FedeDP, incertum, leogr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Andreagit97,FedeDP,incertum,leogr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Andreagit97
Copy link
Member Author

/unhold

@poiana poiana merged commit d3c804b into falcosecurity:master May 13, 2024
50 of 52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants