Skip to content

[SLI Metrics] Add metric kuberay_cluster_condition_provisioned #3635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

win5923
Copy link
Contributor

@win5923 win5923 commented May 19, 2025

Why are these changes needed?

Implement metric kuberay_cluster_provisioned_ready proposed in 1.4.0.

End-to-end test

❯ k apply -f config/samples/ray-cluster.sample.yaml
raycluster.ray.io/raycluster-kuberay-bug created
❯ k apply -f config/samples/ray-service.sample.yaml
rayservice.ray.io/rayservice-sample created
❯ k apply -f config/samples/ray-job.sample.yaml
rayjob.ray.io/rayjob-sample created
configmap/ray-job-code-sample created


# HELP kuberay_cluster_condition_provisioned Indicates whether the RayCluster is provisioned and ready
# TYPE kuberay_cluster_condition_provisioned gauge
kuberay_cluster_condition_provisioned{condition="false",name="raycluster-kuberay-bug",namespace="default"} 1
kuberay_cluster_condition_provisioned{condition="true",name="rayjob-sample-vdr5x",namespace="default"} 1
kuberay_cluster_condition_provisioned{condition="true",name="rayservice-sample-n28cg",namespace="default"} 1

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@win5923 win5923 marked this pull request as ready for review May 19, 2025 15:06
@win5923
Copy link
Contributor Author

win5923 commented May 19, 2025

@troychiu @owenowenisme PTAL, thanks!

@@ -47,6 +49,12 @@ func NewRayClusterMetricsManager(ctx context.Context, client client.Client) *Ray
[]string{"name", "namespace", "owner_kind"},
nil,
),
rayClusterProvisionedReady: prometheus.NewDesc(
"kuberay_cluster_provisioned_ready",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition is RayClusterProvisioned. Would kuberay_cluster_provisioned be more consistent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe kuberay_cluster_condition_provisioned would be more clear. In kube-state-metrics, they use kube_pod_info and kube_pod_status_ready

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in d993eba


func (r *RayClusterMetricsManager) collectRayClusterProvisionedReady(cluster *rayv1.RayCluster, ch chan<- prometheus.Metric) {
condition := "false"
if meta.IsStatusConditionTrue(cluster.Status.Conditions, string(rayv1.RayClusterProvisioned)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simply use strconv.FormatBool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in d993eba

Signed-off-by: win5923 <[email protected]>
@win5923 win5923 changed the title [SLI Metrics] Add metric kuberay_cluster_provisioned_ready [SLI Metrics] Add metric kuberay_cluster_condition_provisioned May 20, 2025
Copy link
Contributor

@owenowenisme owenowenisme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -21,6 +23,7 @@ type RayClusterMetricsObserver interface {
type RayClusterMetricsManager struct {
rayClusterProvisionedDurationSeconds *prometheus.GaugeVec
rayClusterInfo *prometheus.Desc
rayClusterProvisionedReady *prometheus.Desc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rayClusterProvisionedReady *prometheus.Desc
rayClusterConditionProvisioned *prometheus.Desc

@@ -47,6 +50,12 @@ func NewRayClusterMetricsManager(ctx context.Context, client client.Client) *Ray
[]string{"name", "namespace", "owner_kind"},
nil,
),
rayClusterProvisionedReady: prometheus.NewDesc(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rayClusterProvisionedReady: prometheus.NewDesc(
rayClusterConditionProvisioned: prometheus.NewDesc(

@@ -47,6 +50,12 @@ func NewRayClusterMetricsManager(ctx context.Context, client client.Client) *Ray
[]string{"name", "namespace", "owner_kind"},
nil,
),
rayClusterProvisionedReady: prometheus.NewDesc(
"kuberay_cluster_condition_provisioned",
"Indicates whether the RayCluster is provisioned and ready",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Indicates whether the RayCluster is provisioned and ready",
"Indicates whether the RayCluster is provisioned",

defer to @kevin85421 for clearer description

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants