Fix/metrics #42

mneverov · 2024-11-30T16:37:23Z

Requested metrics had existed, but were not exposed.

This patch:

adds a new metrics server running on :9091 by default (see IPAM_METRICS_ADDR env. var)
moves probe and metrics server to util
replaces k8s.io/component-base/metrics/legacyregistry with prometheus/client_golang
metrics are registered in an init func instead of using sync.Once

curl -v localhost:9091/metrics
# HELP node_ipam_controller_multicidrset_allocation_tries_per_request Histogram measuring CIDR allocation tries per request.
# TYPE node_ipam_controller_multicidrset_allocation_tries_per_request histogram
node_ipam_controller_multicidrset_allocation_tries_per_request_bucket{clusterCIDR="10.244.0.0/16",le="1"} 3
node_ipam_controller_multicidrset_allocation_tries_per_request_bucket{clusterCIDR="10.244.0.0/16",le="5"} 3
node_ipam_controller_multicidrset_allocation_tries_per_request_bucket{clusterCIDR="10.244.0.0/16",le="25"} 3
node_ipam_controller_multicidrset_allocation_tries_per_request_bucket{clusterCIDR="10.244.0.0/16",le="125"} 3
node_ipam_controller_multicidrset_allocation_tries_per_request_bucket{clusterCIDR="10.244.0.0/16",le="625"} 3
node_ipam_controller_multicidrset_allocation_tries_per_request_bucket{clusterCIDR="10.244.0.0/16",le="+Inf"} 3
node_ipam_controller_multicidrset_allocation_tries_per_request_sum{clusterCIDR="10.244.0.0/16"} 0
node_ipam_controller_multicidrset_allocation_tries_per_request_count{clusterCIDR="10.244.0.0/16"} 3
node_ipam_controller_multicidrset_allocation_tries_per_request_bucket{clusterCIDR="2001:db8::/110",le="1"} 3
node_ipam_controller_multicidrset_allocation_tries_per_request_bucket{clusterCIDR="2001:db8::/110",le="5"} 3
node_ipam_controller_multicidrset_allocation_tries_per_request_bucket{clusterCIDR="2001:db8::/110",le="25"} 3
node_ipam_controller_multicidrset_allocation_tries_per_request_bucket{clusterCIDR="2001:db8::/110",le="125"} 3
node_ipam_controller_multicidrset_allocation_tries_per_request_bucket{clusterCIDR="2001:db8::/110",le="625"} 3
node_ipam_controller_multicidrset_allocation_tries_per_request_bucket{clusterCIDR="2001:db8::/110",le="+Inf"} 3
node_ipam_controller_multicidrset_allocation_tries_per_request_sum{clusterCIDR="2001:db8::/110"} 0
node_ipam_controller_multicidrset_allocation_tries_per_request_count{clusterCIDR="2001:db8::/110"} 3
# HELP node_ipam_controller_multicidrset_cidrs_allocations_total Counter measuring total number of CIDR allocations.
# TYPE node_ipam_controller_multicidrset_cidrs_allocations_total counter
node_ipam_controller_multicidrset_cidrs_allocations_total{clusterCIDR="10.244.0.0/16"} 3
node_ipam_controller_multicidrset_cidrs_allocations_total{clusterCIDR="2001:db8::/110"} 3
# HELP node_ipam_controller_multicidrset_usage_cidrs Gauge measuring percentage of allocated CIDRs.
# TYPE node_ipam_controller_multicidrset_usage_cidrs gauge
node_ipam_controller_multicidrset_usage_cidrs{clusterCIDR="10.244.0.0/16"} 0.01171875
node_ipam_controller_multicidrset_usage_cidrs{clusterCIDR="2001:db8::/110"} 0.0029296875
# HELP node_ipam_controller_multicirdset_max_cidrs Maximum number of CIDRs that can be allocated.
# TYPE node_ipam_controller_multicirdset_max_cidrs gauge
node_ipam_controller_multicirdset_max_cidrs{clusterCIDR="10.244.0.0/16"} 256
node_ipam_controller_multicirdset_max_cidrs{clusterCIDR="2001:db8::/110"} 1024

Fixes #40

k8s-ci-robot · 2024-11-30T16:37:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mneverov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mneverov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…types.

mneverov · 2024-12-01T07:32:23Z

@aojea ptal

aojea · 2024-12-01T13:15:49Z

adds a new metrics server running on :9091 by default (see IPAM_METRICS_ADDR env. var)

we found that opening multiple ports for expose different endpoints ends causing problems, specially in host network pods, what do you think if we add a new flag --binding-address or similar to indicate the endpoint where the healthz, livez and metrics are exposed and mark the existing one health-probe-address as deprecated.

This way we just maintain one server and use one port, using the url to discriminate

mneverov · 2024-12-01T14:18:05Z

what do you think if ...

Good idea, wanted to combine those two initially, but then checked controller-runtime and it has 3 webservers: webhook, probe, and metrics.

mneverov · 2024-12-01T14:42:55Z

main.go

@@ -166,3 +166,11 @@ func runControllers(ctx context.Context, kubeClient kubernetes.Interface, cfg *r

 	nodeIpamController.Run(ctx)
 }
+
+func bindingAddress(cfg config) string {
+	if cfg.HealthProbeAddr != "" {


By default HealthProbeAddr is empty now. Add this check for existing clusters.
The default addr value :8081 from HealthProbeAddr is now specified for WebserverBindAddr

…be-address env. variable.

aojea · 2024-12-01T15:37:24Z

/lgtm

Thanks

Replace legacy lib with prometheus sdk; autoregister metrics.

b444fef

k8s-ci-robot requested review from danwinship and MikeZappa87 November 30, 2024 16:37

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 30, 2024

mneverov marked this pull request as draft November 30, 2024 17:54

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 30, 2024

mneverov marked this pull request as ready for review December 1, 2024 07:19

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 1, 2024

Move servers start to a separate utils. Use common lib with exported …

576d2a7

…types.

mneverov force-pushed the fix/metrics branch from 5ee1bb9 to 576d2a7 Compare December 1, 2024 07:24

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 1, 2024

mneverov commented Dec 1, 2024

View reviewed changes

Start a common webserver for probes and metrics; deprecate health-pro…

0c9703e

…be-address env. variable.

mneverov force-pushed the fix/metrics branch from 9755a08 to 0c9703e Compare December 1, 2024 14:43

k8s-ci-robot assigned aojea Dec 1, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 1, 2024

k8s-ci-robot merged commit bbf0e52 into kubernetes-sigs:main Dec 1, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/metrics #42

Fix/metrics #42

mneverov commented Nov 30, 2024 •

edited

Loading

k8s-ci-robot commented Nov 30, 2024

mneverov commented Dec 1, 2024

aojea commented Dec 1, 2024

mneverov commented Dec 1, 2024

mneverov Dec 1, 2024

aojea commented Dec 1, 2024

Fix/metrics #42

Fix/metrics #42

Conversation

mneverov commented Nov 30, 2024 • edited Loading

k8s-ci-robot commented Nov 30, 2024

mneverov commented Dec 1, 2024

aojea commented Dec 1, 2024

mneverov commented Dec 1, 2024

mneverov Dec 1, 2024

Choose a reason for hiding this comment

aojea commented Dec 1, 2024

mneverov commented Nov 30, 2024 •

edited

Loading