Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!(libsinsp): coherent metrics interface metrics_collector class + text-based Prometheus exposition format support #1652

Merged
merged 22 commits into from
Mar 13, 2024

Conversation

incertum
Copy link
Contributor

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

  • Everyone hated the initial linsinsp get_* methods wrt stats or metrics, which were expanded as an intermediary step to support the sinsp state counters metrics. As promised, they have all been removed now.
  • It simplifies the consumption of metrics by combining libscap and libsinsp metrics into one vector.
  • The type of metrics is uniformly gated by pushing flags down, as done previously.
  • Easier memory management is achieved by using std::vector as the data structure for m_metrics for safer management. This also helps reduce the knowledge the client needs about the metrics or the number of possible metrics. The client can simply loop over metrics and gate actions based on the metrics' string names.
  • These changes do not affect the hot path, and performance is not of primary concern.
  • Previous metrics are preserved; only code refactoring has been done.
  • A significant renaming attempt has been made to create a more coherent metrics concept while retaining the concepts of scap stats and sinsp stats for what concerns the hot path. Unfortunately, creating a super consistent and transparent stats or metrics naming convention is challenging. In this regard, this PR is a small step forward at best. Due to the renaming, this PR has a large diff, but in reality, it's more about moving and reorganizing code.
  • Memory unit conversion has been moved to the class natively.
  • The complete metrics_v2 schema now features not just the unit but also a new metric type indicating whether the metric is monotonic or reflects the current snapshot state.

There are currently no plans to introduce a metrics writer or move the ticker and stats intervals from Falco into libs at the moment, but this could be discussed in the future.

Which issue(s) this PR fixes:

#1463

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

refactor!(libsinsp): coherent metrics interface, new light weight `metrics_collector` class + technical debt cleanup

@incertum
Copy link
Contributor Author

CC @federico-sysdig to help making sure it's now a better design :) Thank you!

@incertum incertum changed the title refactor!(libsinsp): coherent metrics interface, new light weight metrics_collector class + technical debt cleanup wip: refactor!(libsinsp): coherent metrics interface, new light weight metrics_collector class + technical debt cleanup Jan 27, 2024
@incertum incertum force-pushed the refactor-stats-metrics branch 2 times, most recently from 82044f6 to 84c56dc Compare January 27, 2024 20:36
Copy link
Contributor

@federico-sysdig federico-sysdig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I placed a few random comments. The create function that feels like a half-baked singleton is probably the most important point to discuss and potentially change.
I know that I didn't cover the essence of the refactoring change. It most likely is a good one, but I don't have such a deep knowledge of the project to feel confident my word is of great value here.

userspace/libscap/metrics_v2.h Show resolved Hide resolved
userspace/libsinsp/metrics_collector.cpp Outdated Show resolved Hide resolved
userspace/libsinsp/metrics_collector.cpp Outdated Show resolved Hide resolved
userspace/libsinsp/metrics_collector.cpp Outdated Show resolved Hide resolved
userspace/libsinsp/metrics_collector.cpp Outdated Show resolved Hide resolved
userspace/libsinsp/metrics_collector.h Outdated Show resolved Hide resolved
userspace/libsinsp/metrics_collector.h Outdated Show resolved Hide resolved
userspace/libsinsp/metrics_collector.h Outdated Show resolved Hide resolved
userspace/libsinsp/metrics_collector.cpp Outdated Show resolved Hide resolved
userspace/libsinsp/test/sinsp_metrics.ut.cpp Outdated Show resolved Hide resolved
@incertum incertum changed the title wip: refactor!(libsinsp): coherent metrics interface, new light weight metrics_collector class + technical debt cleanup refactor!(libsinsp): coherent metrics interface, new light weight metrics_collector class + technical debt cleanup Jan 29, 2024
@incertum
Copy link
Contributor Author

/milestone 0.15.0

@poiana poiana added this to the 0.15.0 milestone Jan 29, 2024
@incertum incertum force-pushed the refactor-stats-metrics branch from 34795c5 to 1d4098a Compare January 29, 2024 01:36
Copy link
Contributor

@federico-sysdig federico-sysdig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments and suggestions. Non of them blockers, though a few would be improvements.
Again, this is a superficial review more on the "form" and less on the essence of the change as I'm not deep in the project. I'm sure other reviewers can be relied upon for this part.

userspace/libsinsp/metrics_collector.cpp Outdated Show resolved Hide resolved
userspace/libsinsp/metrics_collector.h Outdated Show resolved Hide resolved
userspace/libsinsp/metrics_collector.h Outdated Show resolved Hide resolved
userspace/libsinsp/metrics_collector.h Outdated Show resolved Hide resolved
userspace/libsinsp/test/sinsp_metrics.ut.cpp Outdated Show resolved Hide resolved
userspace/libsinsp/metrics_collector.h Outdated Show resolved Hide resolved
userspace/libscap/engine/gvisor/gvisor.cpp Outdated Show resolved Hide resolved
@poiana
Copy link
Contributor

poiana commented Jan 29, 2024

@federico-sysdig: changing LGTM is restricted to collaborators

In response to this:

Some more comments and suggestions. Non of them blockers, though a few would be improvements.
Again, this is a superficial review more on the "form" and less on the essence of the change as I'm not deep in the project. I'm sure other reviewers can be relied upon for this part.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@incertum incertum force-pushed the refactor-stats-metrics branch 2 times, most recently from 820c8c5 to d86a915 Compare January 31, 2024 18:27
@poiana
Copy link
Contributor

poiana commented Mar 11, 2024

LGTM label has been added.

Git tree hash: d159b325191cf03aa465ddc5991239d331ba6c98

@Andreagit97 Andreagit97 modified the milestones: 0.15.0, 0.16.0 Mar 11, 2024
@incertum
Copy link
Contributor Author

/milestone 0.15.0

@poiana poiana modified the milestones: 0.16.0, 0.15.0 Mar 11, 2024
@leogr
Copy link
Member

leogr commented Mar 11, 2024

/milestone 0.15.0

I confirm this was intended for 0.15. We just need a second review/approval to merge it.

cc @incertum @Andreagit97 @FedeDP

@Andreagit97
Copy link
Member

I've put/changed the milestones just to organize the PRs for the upcoming release, of course, feel free to change them if you have different ideas/plans :)
Same on all the other PRs!

{
/* KERNEL SIDE STATS COUNTERS */
for(int stat = 0; stat < BPF_MAX_KERNEL_COUNTERS_STATS; stat++)
{
stats[stat].type = STATS_VALUE_TYPE_U64;
stats[stat].flags = PPM_SCAP_STATS_KERNEL_COUNTERS;
stats[stat].type = METRIC_VALUE_TYPE_U64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I start to think that an helper method to fill stats would be a good addition (perhaps even in a followup PR), like:

fill_stats(&stats[stat], type, value, unit, metric_type, flags);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that there is the new_metric method, perhaps we could use it in other places too, outside of metrics_collector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking it here #1463 (comment)

The new_metric method is a template in CPP. In addition for the scap stats for example we have inner loops where we only update the value, e.g. see

for(int cpu = 0; cpu < handle->m_ncpus; cpu++)
when we iterate over the CPUs. Therefore a bit on the fence as I see the value of it being nicer from a software organization point of view, but the current approach seems faster.

Let's think more about it? I believe the real dilemma of metrics is that one part is in C while the other part is in CPP, so it's difficult to find a shared approach that would follow best practices either way. I'll repeat this in my next response below. That's also why I use the arrays approach with the metrics names as that's how we do it in scap / C. I am still not sure, how a really great metrics framework that spans scap and sinsp + pending falco rules counts metrics category should look like frankly.

Copy link
Contributor

@FedeDP FedeDP Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the real dilemma of metrics is that one part is in C while the other part is in CPP

Yep, you are right, i thought about this thing later on. I am not sure how (and if we can) to proceed, i agree to "save the idea" for later; let's see if anyone comes with a good solution.

static re2::RE2 s_libs_metrics_banned_prometheus_naming_characters("(\\.)", re2::RE2::POSIX);

static const char *const sinsp_stats_v2_resource_utilization_names[] = {
[SINSP_RESOURCE_UTILIZATION_CPU_PERC] = "cpu_usage_perc",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really love if we bound an std::function (or a callback anyway) here instead of just the name; it would be much better and future proof, and, using static_assert as requested below, would also enforce that we managed all the metrics.
Eg:

[SINSP_RESOURCE_UTILIZATION_CPU_PERC] = std::function<void(int cpu_usage_perc)>{
    return new_metric("cpu_usage_perc",
					METRICS_V2_RESOURCE_UTILIZATION, 
					METRIC_VALUE_TYPE_D, 
					METRIC_VALUE_UNIT_PERC, 
					METRIC_VALUE_NON_MONOTONIC_CURRENT, 
					cpu_usage_perc));
} 

Then, this array would become:

static std::function sinsp_stats_v2_generators[] = {

and snapshot method could just iterate over all of the generators and emplace_back generated stats.

I am not sure this is correct C++ but i think you got the idea :) Basically i want to enforce as strictly as possible that for every newly added metric, we enforce at compile time that its generator is added here, and then we automatically snapshot it.

Copy link
Contributor Author

@incertum incertum Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this actually sounds like a better balance of having it more enforceable without adding a cumbersome and hard to maintain metrics registry back in. I would propose to tackle this in a follow up PR (must not be by me, @FedeDP feel free to get something up afterwards, happy to review it 😉). It's tracked here #1463 (comment).

As a continuation of my comment above #1652 (comment):
Curious how to find a great approach for all things metrics considered together (scap, sinsp, falco) as metrics are so oddly scattered across not just the entire code base, but more importantly across the entire code flow ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, no problem to post-pone it to a follow up PR! @sgaist wdyt of this approach? Do you see it feasible?
I can tackle it btw :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @FedeDP you are officially signed up for further touching up of the metrics internally!

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is ok! Sorry for the long delay before doing a review; i left some ideas to make the design a little bit more future proof, let me know wdyt!

Co-authored-by: Federico Di Pierro <[email protected]>
Signed-off-by: Melissa Kilby <[email protected]>
Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Mar 13, 2024

LGTM label has been added.

Git tree hash: 8135fee71eb30024775bdf2d0ad1c1a1ce1860fd

Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Mar 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, FedeDP, federico-sysdig, incertum, sgaist

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Andreagit97,FedeDP,incertum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit 29e7c7b into falcosecurity:master Mar 13, 2024
41 checks passed
@incertum incertum deleted the refactor-stats-metrics branch March 13, 2024 15:27
@FedeDP FedeDP mentioned this pull request Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants