Skip to content

[SLI-Metrics] kuberay_service_ready #3577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

owenowenisme
Copy link
Contributor

@owenowenisme owenowenisme commented May 10, 2025

Why are these changes needed?

Added kuberay_service_ready as written in KubeRay v1.4.0 SLI proposal

❯ curl -s localhost:8080/metrics | grep  kuberay_
# HELP kuberay_service_ready RayServiceReady means users can send requests to the underlying cluster and the number of serve endpoints is greater than 0.
# TYPE kuberay_service_ready gauge
kuberay_service_ready{condition="false",name="rayservice-sample",namespace="default"} 1
❯ curl -s localhost:8080/metrics | grep  kuberay_
# HELP kuberay_service_ready RayServiceReady means users can send requests to the underlying cluster and the number of serve endpoints is greater than 0.
# TYPE kuberay_service_ready gauge
kuberay_service_ready{condition="true",name="rayservice-sample",namespace="default"} 1

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@owenowenisme
Copy link
Contributor Author

@win5923 @troychiu

@owenowenisme owenowenisme force-pushed the SLI-Metrics/kuberay-service-ready branch from 7b1dcdb to a0ef666 Compare May 12, 2025 17:53
@owenowenisme owenowenisme requested a review from win5923 May 13, 2025 17:09
@owenowenisme owenowenisme marked this pull request as ready for review May 13, 2025 17:09
Copy link
Contributor

@win5923 win5923 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@owenowenisme owenowenisme marked this pull request as draft May 15, 2025 02:11
Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
@owenowenisme owenowenisme force-pushed the SLI-Metrics/kuberay-service-ready branch from 891aded to b633a37 Compare May 18, 2025 10:02
@owenowenisme owenowenisme marked this pull request as ready for review May 18, 2025 10:09
@owenowenisme
Copy link
Contributor Author

owenowenisme commented May 18, 2025

@troychiu @kevin85421 Would you mind taking a look?

@troychiu
Copy link
Contributor

troychiu commented May 19, 2025

Can't we simply reuse collectRayServiceInfo?

func (c *RayServiceMetricsManager) collectRayServiceInfo(service *rayv1.RayService, ch chan<- prometheus.Metric) {

@owenowenisme
Copy link
Contributor Author

owenowenisme commented May 19, 2025

RayServiceInfo is Desc while RayServiceReady is GaugeVec I don't think we should mix them together.

@troychiu
Copy link
Contributor

RayServiceInfo is a Gauge type of metric which is the same as RayServiceReady. Desc is not a type of metric but a prometheus descriptor. What I meant is we can use the same way to collect RayServiceReady as we did for RayServiceInfo

@owenowenisme
Copy link
Contributor Author

owenowenisme commented May 19, 2025

@troychiu I get what you meant now.

func (c *RayServiceMetricsManager) Collect(ch chan<- prometheus.Metric) {
	var rayServiceList rayv1.RayServiceList
	if err := c.client.List(context.Background(), &rayServiceList); err != nil {
		c.log.Error(err, "Failed to list RayServices")
		return
	}
	for _, rayService := range rayServiceList.Items {
		c.collectRayServiceInfo(&rayService, ch)
	}
	c.RayServiceReady.Collect(ch)
}

func (c *RayServiceMetricsManager) collectRayServiceInfo(service *rayv1.RayService, ch chan<- prometheus.Metric) {
	ch <- prometheus.MustNewConstMetric(
		c.rayServiceInfo,
		prometheus.GaugeValue,
		1,
		service.Name,
		service.Namespace,
	)
	ready := meta.IsStatusConditionTrue(service.Status.Conditions, string(rayv1.RayServiceReady))
	c.RayServiceReady.WithLabelValues(service.Name, service.Namespace, strconv.FormatBool(!ready)).Set(0)
	c.RayServiceReady.WithLabelValues(service.Name, service.Namespace, strconv.FormatBool(ready)).Set(1)
}

But just to make sure, is this what you're thinking in mind?
BTW, I think this is a nice approach! Because we don't have to modify controller with this method. And can apply to rayCluster and rayJob.

@troychiu
Copy link
Contributor

troychiu commented May 19, 2025

Yeah that's it. Actually we can make it simpler.
you can simply do

ch <- prometheus.MustNewConstMetric(
        c.rayServiceReady,
        prometheus.GaugeValue,
        1,
        service.Name,
        service.Namespace,
        condition, // strconv.FormatBool(...) in this case
)

and

for _, rayService := range rayServiceList.Items {
        c.collectRayServiceInfo(&rayService, ch)
        c.collectRayServiceReady(&rayService, ch)
}

Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
Signed-off-by: You-Cheng Lin <[email protected]>
@@ -27,6 +30,12 @@ func NewRayServiceMetricsManager(ctx context.Context, client client.Client) *Ray
[]string{"name", "namespace"},
nil,
),
RayServiceReady: prometheus.NewDesc(
"kuberay_service_ready",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use kuberay_service_condition_ready?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RayServiceReady RayServiceConditionType = "Ready"

kuberay_service_ready should be enough?
But, I think it's just a matter of choice. Both are fine to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more clear. In kube-state-metrics, they use kube_pod_info and kube_pod_status_ready

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Signed-off-by: You-Cheng Lin <[email protected]>
@owenowenisme
Copy link
Contributor Author

Cc @kevin85421 PTAL🙏🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants