Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(test/libsinsp_e2e): fixed libsinsp_e2e tests for more stability #2085

Merged
merged 20 commits into from
Oct 7, 2024

Conversation

FedeDP
Copy link
Contributor

@FedeDP FedeDP commented Sep 27, 2024

What type of PR is this?

/kind bug

Any specific area of the project related to this PR?

/area driver-modern-bpf
/area libsinsp
/area tests

Does this PR require a change in the driver versions?

What this PR does / why we need it:

This also fixes a couple of small issues found while testing, namely:

  • a bug in how cgroups were pushed in the modern bpf driver
  • a new cgroup layout for libvirt-lxc container engine
  • a small cleanup in pman: properly close all fds upon leaving.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Will leave a self-review before removing wip.
Also, i will trigger the CI many times to see if we are really stable; see #2085 (comment) for the results.

Does this PR introduce a user-facing change?:

NONE

@poiana
Copy link
Contributor

poiana commented Sep 27, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FedeDP

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* In this case, we won't push the new character, instead we will push the correct string.
*/
if(kn) {
push__new_character(auxmap->data, &auxmap->payload_pos, '/');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix a bug in how modern_bpf sent cgroups, when number of paths component was greater than MAX_CGROUP_PATH_POINTERS.
For example, for

/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-2f17b2eb-994e-415d-bce0-44c1447d7cd2.scope

that is 7 paths long (considering first one, ie: root, is returned by the kernel as an empty string), we would have returned to userspace:

user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-2f17b2eb-994e-415d-bce0-44c1447d7cd2.scope

ie: the leading / was missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -30,7 +30,7 @@ constexpr const cgroup_layout DOCKER_CGROUP_LAYOUT[] = {{"/", ""}, // non-syste
{nullptr, nullptr}};
}

std::string docker_linux::m_docker_sock = "/var/run/docker.sock";
std::string docker_linux::s_docker_sock = "/var/run/docker.sock";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a static, rename it like we use to call static members, with s_ prefix.

// For cgroups like:
// /machine.slice/machine-lxc\x2d2293906\x2dlibvirt\x2dcontainer.scope/libvirt,
// account for /libvirt below.
if(cgroup.find(".scope/libvirt") != std::string::npos) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new cgroup layout for libvirt-lxc.

// Just a stupid fake FD value to signal to stop capturing events from driver and exit.
// Note: we don't use it through eventfd because we want to make sure
// that we received all events from the driver, until this very last close(FD_SIGNAL_STOP);
#define EVENTFD_SIGNAL_STOP 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid infinite loops when missing syscalls, after each test run() callback is called, we trigger the eventfd to signal that the capture must be stopped and all remaining events consumed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: i avoided sending a fake syscall event (like a close(555)) because if you are losing events, chances are high that you'll lose also this canary event, thus leading to infinitely looping.


static void do_nothing(sinsp* inspector) {}

static bool always_continue() { return true; }
static void run(const run_callback_t& run_function,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have 2 different fashioned run static functions exposed by the event_capture class: one run the test callback (run_function) synchronously and then gathers all generated events; this one can also access the inspector in the callback since it is synchronous.

The other one is async, thus the callback is prevented to access the inspector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: i kept the same name because i found it easier to integrate; so the only difference between a sync and an async invocation is just the first parameter type (run_callback_t vs run_callback_async_t).

Copy link

github-actions bot commented Sep 27, 2024

Perf diff from master - unit tests

Warning:
Processed 24029 events and lost 1 chunks!

Check IO/CPU overload!

     8.98%     -2.79%  [.] sinsp::next
     9.20%     +1.37%  [.] sinsp_parser::reset
     4.42%     +1.31%  [.] sinsp_parser::process_event
     2.00%     -0.85%  [.] libsinsp::sinsp_suppress::process_event
     1.38%     +0.82%  [.] std::_Hashtable<long, std::pair<long const, std::shared_ptr<sinsp_threadinfo> >, std::allocator<std::pair<long const, std::shared_ptr<sinsp_threadinfo> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node

Heap diff from master - unit tests

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Benchmarks diff from master

Comparing gbench_data.json to /root/actions-runner/_work/libs/libs/build/gbench_data.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
BM_sinsp_split_mean                                            +0.0369         +0.0369           147           153           147           153
BM_sinsp_split_median                                          +0.0361         +0.0361           147           152           147           152
BM_sinsp_split_stddev                                          +1.6997         +1.6992             0             1             0             1
BM_sinsp_split_cv                                              +1.6037         +1.6033             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_mean                  +0.0704         +0.0704            57            61            57            61
BM_sinsp_concatenate_paths_relative_path_median                +0.0709         +0.0709            57            61            57            61
BM_sinsp_concatenate_paths_relative_path_stddev                -0.2511         -0.2523             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_cv                    -0.3003         -0.3014             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_mean                     -0.0001         -0.0001            25            25            25            25
BM_sinsp_concatenate_paths_empty_path_median                   +0.0017         +0.0017            25            25            25            25
BM_sinsp_concatenate_paths_empty_path_stddev                   -0.2305         -0.2307             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_cv                       -0.2304         -0.2306             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_mean                  +0.1125         +0.1125            57            63            57            63
BM_sinsp_concatenate_paths_absolute_path_median                +0.1156         +0.1156            56            63            56            63
BM_sinsp_concatenate_paths_absolute_path_stddev                -0.7791         -0.7793             1             0             1             0
BM_sinsp_concatenate_paths_absolute_path_cv                    -0.8014         -0.8017             0             0             0             0
BM_sinsp_split_container_image_mean                            +0.0225         +0.0225           402           411           402           411
BM_sinsp_split_container_image_median                          +0.0202         +0.0202           403           411           403           411
BM_sinsp_split_container_image_stddev                          -0.2991         -0.2990             3             2             3             2
BM_sinsp_split_container_image_cv                              -0.3145         -0.3145             0             0             0             0

if(m_mode != SINSP_MODE_NODRIVER && m_dump) {
dumper->dump(event);
}
handle_eventfd_request();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before doing anything else, check if any eventfd request was sent; this is non-blocking of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are wondering why i did not just use eg:
close(MY_FAKE_FD) to signal the capture to end, is because during my tests it happened that the signaling event got lost and then ♾️

Copy link
Contributor Author

@FedeDP FedeDP Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, having a read on each loop creates small back-pressure on the drivers.
But tests that generate lots of syscalls traffic are now enabling only the sinsp state sc set of syscalls, therefore in that case, no read will appear in our syscall event loop ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes put, by default, the whole sc set minus read and readv to avoid the backpressure (ie: we do not push eventfd read related events to userspace).
Of course some tests rely on reads, in that case they must explicitly enable the libsinsp::events::all_sc_set().

Also, the sc set is now a parameter of event_capture::run() APIs.

Copy link

codecov bot commented Sep 27, 2024

Codecov Report

Attention: Patch coverage is 14.28571% with 6 lines in your changes missing coverage. Please review.

Project coverage is 73.58%. Comparing base (aeb8793) to head (a904e2e).
Report is 44 commits behind head on master.

Files with missing lines Patch % Lines
...serspace/libsinsp/container_engine/libvirt_lxc.cpp 0.00% 5 Missing ⚠️
...ce/libsinsp/container_engine/docker/docker_linux.h 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2085      +/-   ##
==========================================
- Coverage   73.58%   73.58%   -0.01%     
==========================================
  Files         253      253              
  Lines       31869    31872       +3     
  Branches     5649     5635      -14     
==========================================
  Hits        23452    23452              
+ Misses       8416     8405      -11     
- Partials        1       15      +14     
Flag Coverage Δ
libsinsp 73.58% <14.28%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

FAIL() << "caught exception " << e.what();
}
auto capture_stats_str = capture_stats(m_inspector.get());
std::cout << capture_stats_str << "\n";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always print capture_stats upon leaving so that we know if any drop was present.

Copy link

github-actions bot commented Sep 27, 2024

X64 kernel testing matrix

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-4.19 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2-5.10 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2023-6.1 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.0 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.7 🟢 🟢 🟢 🟢 🟢 🟢
centos-3.10 🟢 🟢 🟢 🟡 🟡 🟡
centos-4.18 🟢 🟢 🟢 🟢 🟢 🟢
centos-5.14 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.17 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.8 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-3.10 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-4.14 🟢 🟢 🟢 🟢 🟢 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-5.4 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-4.15 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-5.8 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

ARM64 kernel testing matrix

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-4.14 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

@FedeDP
Copy link
Contributor Author

FedeDP commented Sep 27, 2024

/milestone 0.19.0

@poiana poiana added this to the 0.19.0 milestone Sep 27, 2024
@FedeDP
Copy link
Contributor Author

FedeDP commented Sep 30, 2024


std::string event_capture::s_engine_string = KMOD_ENGINE;
std::string event_capture::s_engine_path;
unsigned long event_capture::s_buffer_dim = DEFAULT_DRIVER_BUFFER_BYTES_DIM * 4;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use big buffers to avoid losing events.

@FedeDP
Copy link
Contributor Author

FedeDP commented Sep 30, 2024

Arm64 modern-ebpf tests failed again with latest code, even if there were no drops :/
Will give it a look tomorrow.

FedeDP added 16 commits October 1, 2024 10:58
…s time to leave.

The `close` syscall might get lost leading to an infinite loop; instead,
now we ask to the main thread to leave using thread safe eventfd,
and the main thread will dequeue all remaining events until an error
is returned by sinsp::next.

Signed-off-by: Federico Di Pierro <[email protected]>
Signed-off-by: Federico Di Pierro <[email protected]>
…er resolving on newer linux systemd systems.

This fixes the `sys_call_test.container_libvirt` running on my machine.
Also, let event_capture always print capture stats for us.

Signed-off-by: Federico Di Pierro <[email protected]>
… is a static.

Rename `m_docker_sock` to `s_docker_sock` to highlight that it is static.

Signed-off-by: Federico Di Pierro <[email protected]>
…ents > MAX_CGROUP_PATH_POINTERS

Signed-off-by: Federico Di Pierro <[email protected]>
…erver_with_connection_before_capturing_starts_ipv4m` test.

Signed-off-by: Federico Di Pierro <[email protected]>
…est to avoid drops.

Signed-off-by: Federico Di Pierro <[email protected]>
Default interesting syscalls set now avoids `read` and `pread` to avoid
back-pressure with `eventfd_read` being called at each loop iteration.
Moreover, `event_capture::run()` now accepts a ppm_sc_set parameter
to customize the sc set for the test.

Finally, in rlimit related tests, reset old limits upon leaving.

Signed-off-by: Federico Di Pierro <[email protected]>
@FedeDP
Copy link
Contributor Author

FedeDP commented Oct 1, 2024

I don't think we can do much better :)
immagine

removing wip!

Cc @LucaGuerra @therealbobo

@FedeDP FedeDP changed the title wip: fix(test/libsinsp_e2e): fixed libsinsp_e2e tests for more stability fix(test/libsinsp_e2e): fixed libsinsp_e2e tests for more stability Oct 1, 2024
@therealbobo
Copy link
Contributor

Love this! What an amazing job! LGTM! 🚀

@poiana poiana merged commit 2b1e402 into master Oct 7, 2024
56 of 58 checks passed
@poiana poiana deleted the fix/test_e2e branch October 7, 2024 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants