refactor!(libsinsp): coherent metrics interface `metrics_collector` class + text-based Prometheus exposition format support #1652

incertum · 2024-01-27T01:07:39Z

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Everyone hated the initial linsinsp get_* methods wrt stats or metrics, which were expanded as an intermediary step to support the sinsp state counters metrics. As promised, they have all been removed now.
It simplifies the consumption of metrics by combining libscap and libsinsp metrics into one vector.
The type of metrics is uniformly gated by pushing flags down, as done previously.
Easier memory management is achieved by using std::vector as the data structure for m_metrics for safer management. This also helps reduce the knowledge the client needs about the metrics or the number of possible metrics. The client can simply loop over metrics and gate actions based on the metrics' string names.
These changes do not affect the hot path, and performance is not of primary concern.
Previous metrics are preserved; only code refactoring has been done.
A significant renaming attempt has been made to create a more coherent metrics concept while retaining the concepts of scap stats and sinsp stats for what concerns the hot path. Unfortunately, creating a super consistent and transparent stats or metrics naming convention is challenging. In this regard, this PR is a small step forward at best. Due to the renaming, this PR has a large diff, but in reality, it's more about moving and reorganizing code.
Memory unit conversion has been moved to the class natively.
The complete metrics_v2 schema now features not just the unit but also a new metric type indicating whether the metric is monotonic or reflects the current snapshot state.

There are currently no plans to introduce a metrics writer or move the ticker and stats intervals from Falco into libs at the moment, but this could be discussed in the future.

Which issue(s) this PR fixes:

#1463

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

refactor!(libsinsp): coherent metrics interface, new light weight `metrics_collector` class + technical debt cleanup

incertum · 2024-01-27T01:12:31Z

CC @federico-sysdig to help making sure it's now a better design :) Thank you!

federico-sysdig

I placed a few random comments. The create function that feels like a half-baked singleton is probably the most important point to discuss and potentially change.
I know that I didn't cover the essence of the refactoring change. It most likely is a good one, but I don't have such a deep knowledge of the project to feel confident my word is of great value here.

userspace/libscap/metrics_v2.h

userspace/libsinsp/metrics_collector.cpp

userspace/libsinsp/metrics_collector.h

userspace/libsinsp/metrics_collector.cpp

userspace/libsinsp/test/sinsp_metrics.ut.cpp

userspace/libsinsp/metrics_collector.h

incertum · 2024-01-29T01:35:28Z

/milestone 0.15.0

federico-sysdig

Some more comments and suggestions. Non of them blockers, though a few would be improvements.
Again, this is a superficial review more on the "form" and less on the essence of the change as I'm not deep in the project. I'm sure other reviewers can be relied upon for this part.

userspace/libsinsp/metrics_collector.cpp

userspace/libsinsp/metrics_collector.h

userspace/libsinsp/test/sinsp_metrics.ut.cpp

userspace/libsinsp/metrics_collector.h

userspace/libscap/engine/gvisor/gvisor.cpp

poiana · 2024-01-29T17:45:01Z

@federico-sysdig: changing LGTM is restricted to collaborators

In response to this:

Some more comments and suggestions. Non of them blockers, though a few would be improvements.
Again, this is a superficial review more on the "form" and less on the essence of the change as I'm not deep in the project. I'm sure other reviewers can be relied upon for this part.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

poiana · 2024-03-11T09:38:11Z

LGTM label has been added.

Git tree hash: d159b325191cf03aa465ddc5991239d331ba6c98

incertum · 2024-03-11T15:53:15Z

/milestone 0.15.0

leogr · 2024-03-11T16:24:54Z

/milestone 0.15.0

I confirm this was intended for 0.15. We just need a second review/approval to merge it.

cc @incertum @Andreagit97 @FedeDP

Andreagit97 · 2024-03-11T17:08:37Z

I've put/changed the milestones just to organize the PRs for the upcoming release, of course, feel free to change them if you have different ideas/plans :)
Same on all the other PRs!

userspace/libpman/src/stats.c

FedeDP · 2024-03-12T07:47:08Z

userspace/libscap/engine/bpf/scap_bpf.c

 	{
 		/* KERNEL SIDE STATS COUNTERS */
 		for(int stat = 0; stat < BPF_MAX_KERNEL_COUNTERS_STATS; stat++)
 		{
-			stats[stat].type = STATS_VALUE_TYPE_U64;
-			stats[stat].flags = PPM_SCAP_STATS_KERNEL_COUNTERS;
+			stats[stat].type = METRIC_VALUE_TYPE_U64;


I start to think that an helper method to fill stats would be a good addition (perhaps even in a followup PR), like:

fill_stats(&stats[stat], type, value, unit, metric_type, flags);

I see that there is the new_metric method, perhaps we could use it in other places too, outside of metrics_collector?

Tracking it here #1463 (comment)

The new_metric method is a template in CPP. In addition for the scap stats for example we have inner loops where we only update the value, e.g. see

libs/userspace/libscap/engine/bpf/scap_bpf.c

Line 1738 in 5dc692e

for(int cpu = 0; cpu < handle->m_ncpus; cpu++)

when we iterate over the CPUs. Therefore a bit on the fence as I see the value of it being nicer from a software organization point of view, but the current approach seems faster.

Let's think more about it? I believe the real dilemma of metrics is that one part is in C while the other part is in CPP, so it's difficult to find a shared approach that would follow best practices either way. I'll repeat this in my next response below. That's also why I use the arrays approach with the metrics names as that's how we do it in scap / C. I am still not sure, how a really great metrics framework that spans scap and sinsp + pending falco rules counts metrics category should look like frankly.

I believe the real dilemma of metrics is that one part is in C while the other part is in CPP

Yep, you are right, i thought about this thing later on. I am not sure how (and if we can) to proceed, i agree to "save the idea" for later; let's see if anyone comes with a good solution.

userspace/libsinsp/metrics_collector.cpp

FedeDP · 2024-03-12T08:08:32Z

userspace/libsinsp/metrics_collector.cpp

+static re2::RE2 s_libs_metrics_banned_prometheus_naming_characters("(\\.)", re2::RE2::POSIX);
+
+static const char *const sinsp_stats_v2_resource_utilization_names[] = {
+	[SINSP_RESOURCE_UTILIZATION_CPU_PERC] = "cpu_usage_perc",


I would really love if we bound an std::function (or a callback anyway) here instead of just the name; it would be much better and future proof, and, using static_assert as requested below, would also enforce that we managed all the metrics.
Eg:

[SINSP_RESOURCE_UTILIZATION_CPU_PERC] = std::function<void(int cpu_usage_perc)>{ return new_metric("cpu_usage_perc", METRICS_V2_RESOURCE_UTILIZATION, METRIC_VALUE_TYPE_D, METRIC_VALUE_UNIT_PERC, METRIC_VALUE_NON_MONOTONIC_CURRENT, cpu_usage_perc)); }

Then, this array would become:

static std::function sinsp_stats_v2_generators[] = {

and snapshot method could just iterate over all of the generators and emplace_back generated stats.

I am not sure this is correct C++ but i think you got the idea :) Basically i want to enforce as strictly as possible that for every newly added metric, we enforce at compile time that its generator is added here, and then we automatically snapshot it.

Yes, this actually sounds like a better balance of having it more enforceable without adding a cumbersome and hard to maintain metrics registry back in. I would propose to tackle this in a follow up PR (must not be by me, @FedeDP feel free to get something up afterwards, happy to review it 😉). It's tracked here #1463 (comment).

As a continuation of my comment above #1652 (comment):
Curious how to find a great approach for all things metrics considered together (scap, sinsp, falco) as metrics are so oddly scattered across not just the entire code base, but more importantly across the entire code flow ...

That makes sense, no problem to post-pone it to a follow up PR! @sgaist wdyt of this approach? Do you see it feasible?
I can tackle it btw :)

Great @FedeDP you are officially signed up for further touching up of the metrics internally!

FedeDP

PR is ok! Sorry for the long delay before doing a review; i left some ideas to make the design a little bit more future proof, let me know wdyt!

Co-authored-by: Federico Di Pierro <[email protected]> Signed-off-by: Melissa Kilby <[email protected]>

userspace/libsinsp/metrics_collector.cpp

FedeDP

/approve

poiana · 2024-03-13T07:14:38Z

LGTM label has been added.

Git tree hash: 8135fee71eb30024775bdf2d0ad1c1a1ce1860fd

Andreagit97

/approve

poiana · 2024-03-13T09:25:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, FedeDP, federico-sysdig, incertum, sgaist

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Andreagit97,FedeDP,incertum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

poiana added release-note kind/cleanup dco-signoff: no kind/feature New feature or request area/libscap area/libpman area/libsinsp size/XXL approved labels Jan 27, 2024

poiana requested review from hbrueckner and leogr January 27, 2024 01:08

incertum force-pushed the refactor-stats-metrics branch from 15f5750 to 9042675 Compare January 27, 2024 01:10

poiana added dco-signoff: yes and removed dco-signoff: no labels Jan 27, 2024

incertum changed the title ~~refactor!(libsinsp): coherent metrics interface, new light weight metrics_collector class + technical debt cleanup~~ wip: refactor!(libsinsp): coherent metrics interface, new light weight metrics_collector class + technical debt cleanup Jan 27, 2024

poiana added the do-not-merge/work-in-progress label Jan 27, 2024

incertum force-pushed the refactor-stats-metrics branch 2 times, most recently from 82044f6 to 84c56dc Compare January 27, 2024 20:36

federico-sysdig reviewed Jan 27, 2024

View reviewed changes

incertum commented Jan 29, 2024

View reviewed changes

userspace/libsinsp/metrics_collector.h Outdated Show resolved Hide resolved

incertum changed the title ~~wip: refactor!(libsinsp): coherent metrics interface, new light weight metrics_collector class + technical debt cleanup~~ refactor!(libsinsp): coherent metrics interface, new light weight metrics_collector class + technical debt cleanup Jan 29, 2024

poiana removed the do-not-merge/work-in-progress label Jan 29, 2024

poiana added this to the 0.15.0 milestone Jan 29, 2024

incertum force-pushed the refactor-stats-metrics branch from 34795c5 to 1d4098a Compare January 29, 2024 01:36

federico-sysdig approved these changes Jan 29, 2024

View reviewed changes

incertum force-pushed the refactor-stats-metrics branch 2 times, most recently from 820c8c5 to d86a915 Compare January 31, 2024 18:27

Andreagit97 modified the milestones: 0.15.0, 0.16.0 Mar 11, 2024

poiana modified the milestones: 0.16.0, 0.15.0 Mar 11, 2024

FedeDP reviewed Mar 12, 2024

View reviewed changes

userspace/libpman/src/stats.c Show resolved Hide resolved

FedeDP reviewed Mar 12, 2024

View reviewed changes

userspace/libsinsp/metrics_collector.cpp Show resolved Hide resolved

FedeDP reviewed Mar 12, 2024

View reviewed changes

userspace/libsinsp/metrics_collector.cpp Show resolved Hide resolved

FedeDP reviewed Mar 12, 2024

View reviewed changes

cleanup(metrics): apply reviewers suggestions

2c3092e

Co-authored-by: Federico Di Pierro <[email protected]> Signed-off-by: Melissa Kilby <[email protected]>

incertum dismissed leogr’s stale review via 2c3092e March 13, 2024 00:29

poiana removed the lgtm label Mar 13, 2024

poiana requested a review from leogr March 13, 2024 00:29

incertum mentioned this pull request Mar 13, 2024

[TRACKING] Create a more coherent stats model for libs and consumer #1463

Closed

FedeDP reviewed Mar 13, 2024

View reviewed changes

userspace/libsinsp/metrics_collector.cpp Show resolved Hide resolved

FedeDP approved these changes Mar 13, 2024

View reviewed changes

poiana assigned FedeDP Mar 13, 2024

poiana added the lgtm label Mar 13, 2024

Andreagit97 approved these changes Mar 13, 2024

View reviewed changes

poiana assigned Andreagit97 Mar 13, 2024

poiana merged commit 29e7c7b into falcosecurity:master Mar 13, 2024
41 checks passed

incertum deleted the refactor-stats-metrics branch March 13, 2024 15:27

FedeDP mentioned this pull request Mar 15, 2024

CI asan jobs failures #1750

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor!(libsinsp): coherent metrics interface `metrics_collector` class + text-based Prometheus exposition format support #1652

refactor!(libsinsp): coherent metrics interface `metrics_collector` class + text-based Prometheus exposition format support #1652

incertum commented Jan 27, 2024

incertum commented Jan 27, 2024

federico-sysdig left a comment •

edited

Loading

incertum commented Jan 29, 2024

federico-sysdig left a comment

poiana commented Jan 29, 2024

poiana commented Mar 11, 2024

incertum commented Mar 11, 2024

leogr commented Mar 11, 2024

Andreagit97 commented Mar 11, 2024

FedeDP Mar 12, 2024

FedeDP Mar 12, 2024

incertum Mar 13, 2024

FedeDP Mar 13, 2024 •

edited

Loading

FedeDP Mar 12, 2024

incertum Mar 13, 2024 •

edited

Loading

FedeDP Mar 13, 2024

incertum Mar 13, 2024

FedeDP left a comment

FedeDP left a comment

poiana commented Mar 13, 2024

Andreagit97 left a comment

poiana commented Mar 13, 2024

refactor!(libsinsp): coherent metrics interface metrics_collector class + text-based Prometheus exposition format support #1652

refactor!(libsinsp): coherent metrics interface metrics_collector class + text-based Prometheus exposition format support #1652

Conversation

incertum commented Jan 27, 2024

incertum commented Jan 27, 2024

federico-sysdig left a comment • edited Loading

Choose a reason for hiding this comment

incertum commented Jan 29, 2024

federico-sysdig left a comment

Choose a reason for hiding this comment

poiana commented Jan 29, 2024

poiana commented Mar 11, 2024

incertum commented Mar 11, 2024

leogr commented Mar 11, 2024

Andreagit97 commented Mar 11, 2024

FedeDP Mar 12, 2024

Choose a reason for hiding this comment

FedeDP Mar 12, 2024

Choose a reason for hiding this comment

incertum Mar 13, 2024

Choose a reason for hiding this comment

FedeDP Mar 13, 2024 • edited Loading

Choose a reason for hiding this comment

FedeDP Mar 12, 2024

Choose a reason for hiding this comment

incertum Mar 13, 2024 • edited Loading

Choose a reason for hiding this comment

FedeDP Mar 13, 2024

Choose a reason for hiding this comment

incertum Mar 13, 2024

Choose a reason for hiding this comment

FedeDP left a comment

Choose a reason for hiding this comment

FedeDP left a comment

Choose a reason for hiding this comment

poiana commented Mar 13, 2024

Andreagit97 left a comment

Choose a reason for hiding this comment

poiana commented Mar 13, 2024

refactor!(libsinsp): coherent metrics interface `metrics_collector` class + text-based Prometheus exposition format support #1652

refactor!(libsinsp): coherent metrics interface `metrics_collector` class + text-based Prometheus exposition format support #1652

federico-sysdig left a comment •

edited

Loading

FedeDP Mar 13, 2024 •

edited

Loading

incertum Mar 13, 2024 •

edited

Loading