-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor!(libsinsp): coherent metrics interface metrics_collector
class + text-based Prometheus exposition format support
#1652
refactor!(libsinsp): coherent metrics interface metrics_collector
class + text-based Prometheus exposition format support
#1652
Conversation
15f5750
to
9042675
Compare
CC @federico-sysdig to help making sure it's now a better design :) Thank you! |
metrics_collector
class + technical debt cleanupmetrics_collector
class + technical debt cleanup
82044f6
to
84c56dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I placed a few random comments. The create
function that feels like a half-baked singleton is probably the most important point to discuss and potentially change.
I know that I didn't cover the essence of the refactoring change. It most likely is a good one, but I don't have such a deep knowledge of the project to feel confident my word is of great value here.
metrics_collector
class + technical debt cleanupmetrics_collector
class + technical debt cleanup
/milestone 0.15.0 |
34795c5
to
1d4098a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments and suggestions. Non of them blockers, though a few would be improvements.
Again, this is a superficial review more on the "form" and less on the essence of the change as I'm not deep in the project. I'm sure other reviewers can be relied upon for this part.
@federico-sysdig: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
820c8c5
to
d86a915
Compare
LGTM label has been added. Git tree hash: d159b325191cf03aa465ddc5991239d331ba6c98
|
/milestone 0.15.0 |
I confirm this was intended for 0.15. We just need a second review/approval to merge it. |
I've put/changed the milestones just to organize the PRs for the upcoming release, of course, feel free to change them if you have different ideas/plans :) |
{ | ||
/* KERNEL SIDE STATS COUNTERS */ | ||
for(int stat = 0; stat < BPF_MAX_KERNEL_COUNTERS_STATS; stat++) | ||
{ | ||
stats[stat].type = STATS_VALUE_TYPE_U64; | ||
stats[stat].flags = PPM_SCAP_STATS_KERNEL_COUNTERS; | ||
stats[stat].type = METRIC_VALUE_TYPE_U64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I start to think that an helper method to fill stats would be a good addition (perhaps even in a followup PR), like:
fill_stats(&stats[stat], type, value, unit, metric_type, flags);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that there is the new_metric
method, perhaps we could use it in other places too, outside of metrics_collector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking it here #1463 (comment)
The new_metric
method is a template in CPP. In addition for the scap stats for example we have inner loops where we only update the value, e.g. see
libs/userspace/libscap/engine/bpf/scap_bpf.c
Line 1738 in 5dc692e
for(int cpu = 0; cpu < handle->m_ncpus; cpu++) |
Let's think more about it? I believe the real dilemma of metrics
is that one part is in C while the other part is in CPP, so it's difficult to find a shared approach that would follow best practices either way. I'll repeat this in my next response below. That's also why I use the arrays approach with the metrics names as that's how we do it in scap / C. I am still not sure, how a really great metrics framework that spans scap and sinsp + pending falco rules counts metrics category should look like frankly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the real dilemma of metrics is that one part is in C while the other part is in CPP
Yep, you are right, i thought about this thing later on. I am not sure how (and if we can) to proceed, i agree to "save the idea" for later; let's see if anyone comes with a good solution.
static re2::RE2 s_libs_metrics_banned_prometheus_naming_characters("(\\.)", re2::RE2::POSIX); | ||
|
||
static const char *const sinsp_stats_v2_resource_utilization_names[] = { | ||
[SINSP_RESOURCE_UTILIZATION_CPU_PERC] = "cpu_usage_perc", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really love if we bound an std::function
(or a callback anyway) here instead of just the name; it would be much better and future proof, and, using static_assert
as requested below, would also enforce that we managed all the metrics.
Eg:
[SINSP_RESOURCE_UTILIZATION_CPU_PERC] = std::function<void(int cpu_usage_perc)>{
return new_metric("cpu_usage_perc",
METRICS_V2_RESOURCE_UTILIZATION,
METRIC_VALUE_TYPE_D,
METRIC_VALUE_UNIT_PERC,
METRIC_VALUE_NON_MONOTONIC_CURRENT,
cpu_usage_perc));
}
Then, this array would become:
static std::function sinsp_stats_v2_generators[] = {
and snapshot
method could just iterate over all of the generators and emplace_back generated stats.
I am not sure this is correct C++ but i think you got the idea :) Basically i want to enforce as strictly as possible that for every newly added metric, we enforce at compile time that its generator is added here, and then we automatically snapshot it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this actually sounds like a better balance of having it more enforceable without adding a cumbersome and hard to maintain metrics registry back in. I would propose to tackle this in a follow up PR (must not be by me, @FedeDP feel free to get something up afterwards, happy to review it 😉). It's tracked here #1463 (comment).
As a continuation of my comment above #1652 (comment):
Curious how to find a great approach for all things metrics considered together (scap, sinsp, falco) as metrics are so oddly scattered across not just the entire code base, but more importantly across the entire code flow ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, no problem to post-pone it to a follow up PR! @sgaist wdyt of this approach? Do you see it feasible?
I can tackle it btw :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great @FedeDP you are officially signed up for further touching up of the metrics internally!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR is ok! Sorry for the long delay before doing a review; i left some ideas to make the design a little bit more future proof, let me know wdyt!
Co-authored-by: Federico Di Pierro <[email protected]> Signed-off-by: Melissa Kilby <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
LGTM label has been added. Git tree hash: 8135fee71eb30024775bdf2d0ad1c1a1ce1860fd
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Andreagit97, FedeDP, federico-sysdig, incertum, sgaist The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
/kind feature
Any specific area of the project related to this PR?
/area libscap
/area libpman
/area libsinsp
Does this PR require a change in the driver versions?
What this PR does / why we need it:
get_*
methods wrt stats or metrics, which were expanded as an intermediary step to support the sinsp state counters metrics. As promised, they have all been removed now.std::vector
as the data structure form_metrics
for safer management. This also helps reduce the knowledge the client needs about the metrics or the number of possible metrics. The client can simply loop over metrics and gate actions based on the metrics' string names.metrics_v2
schema now features not just the unit but also a new metric type indicating whether the metric is monotonic or reflects the current snapshot state.There are currently no plans to introduce a metrics writer or move the ticker and stats intervals from Falco into libs at the moment, but this could be discussed in the future.
Which issue(s) this PR fixes:
#1463
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: