TRT-2254: extract Operator Progressing / Degraded Counts and Timing #30449

jianlinliu · 2025-11-03T02:54:01Z

extract operator Progressing / Degraded Counts and Timing from intervals, collect them and save them into a auto data loader json file for historical analysis.

openshift-ci-robot · 2025-11-03T02:54:05Z

@jianlinliu: This pull request references TRT-2254 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-11-03T02:54:42Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jianlinliu
Once this PR has been reviewed and has the lgtm label, please assign smg247 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-trt · 2025-11-03T06:06:41Z

Job Failure Risk Analysis for sha: f01fda0

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-gcp-csi	IncompleteTests Tests for this run (106) are below the historical average (1798): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn	IncompleteTests Tests for this run (105) are below the historical average (3244): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade	IncompleteTests Tests for this run (106) are below the historical average (1801): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6	IncompleteTests Tests for this run (101) are below the historical average (3006): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn	IncompleteTests Tests for this run (103) are below the historical average (3313): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi	IncompleteTests Tests for this run (103) are below the historical average (3351): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

openshift-trt · 2025-11-03T11:11:54Z

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: 11e2b1c

"[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer cleanup" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
"[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer collection" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
"[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer interval construction" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
"[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer preparation" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
"[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer setup" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
"[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer test evaluation" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
"[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer writing to storage" [Total: 12, Pass: 12, Fail: 0, Flake: 0]

jianlinliu · 2025-11-04T06:53:38Z

/payload-job periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

openshift-ci · 2025-11-04T06:53:41Z

@jianlinliu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/fe5001a0-b94a-11f0-82fa-e1d6c7ee712e-0

openshift-ci-robot · 2025-11-05T01:21:40Z

@jianlinliu: This pull request references TRT-2254 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

extract operator Progressing / Degraded Counts and Timing from intervals, collect them and save them into a auto data loader json file for historical analysis.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jianlinliu · 2025-11-05T01:24:33Z

/test unit

jianlinliu · 2025-11-05T01:53:14Z

/test e2e-aws-ovn-microshift-serial

jianlinliu · 2025-11-05T01:53:28Z

/payload-job periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

openshift-ci · 2025-11-05T01:53:32Z

@jianlinliu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/3a563760-b9ea-11f0-8835-258702823538-0

jianlinliu · 2025-11-05T04:07:28Z

/test e2e-gcp-ovn

openshift-ci · 2025-11-05T07:17:48Z

@jianlinliu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-ovn	`c5b8028`	link	true	`/test e2e-gcp-ovn`
ci/prow/e2e-aws-ovn-serial-2of2	`c5b8028`	link	true	`/test e2e-aws-ovn-serial-2of2`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jianlinliu · 2025-11-05T07:43:42Z

From the metrics autodl json file, it was generated as expectation.

neisw · 2025-11-05T11:49:51Z

pkg/monitortests/clusterversionoperator/operatorstateanalyzer/monitortest.go

+	if len(metrics) > 0 {
+		rows := generateRowsFromMetrics(metrics)
+		dataFile := dataloader.DataFile{
+			TableName: "operator_state_metrics",


I'm wondering if instead of the generic "Metric" we should have defined "Count", "TotalSeconds" and maybe "MinSeconds" and "MaxSeconds" instead of "IndividualDurationSeconds". Will see if others have thoughts on this.

I think the consensus was to make this a single row per operator/condition tracking

"Count", "TotalSeconds" and "MaxIndividualDurationSeconds"

dgoodwin · 2025-11-05T12:02:06Z

pkg/monitortests/clusterversionoperator/operatorstateanalyzer/monitortest.go

+		if err := dataloader.WriteDataFile(fileName, dataFile); err != nil {
+			return fmt.Errorf("failed to write operator state metrics: %w", err)
+		}
+		fmt.Printf("--->Write operator state metrics to %s successfully.\n", fileName)


I like to encourage the use of logrus.Infof for this, clean syntax and ensures we get timestamps for debugging purposes.

Ah, actually that line was added for debugging, sure, I will update it to use logrus.Infof.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 3, 2025

openshift-ci bot requested review from p0lyn0mial and sjenning November 3, 2025 02:54

jianlinliu force-pushed the TRT-2254 branch from a737ae9 to f01fda0 Compare November 3, 2025 03:41

jianlinliu force-pushed the TRT-2254 branch from f01fda0 to 11e2b1c Compare November 3, 2025 06:14

extract Operator Progressing / Degraded Counts and Timing

c5b8028

jianlinliu force-pushed the TRT-2254 branch from 97c4277 to c5b8028 Compare November 4, 2025 11:48

neisw reviewed Nov 5, 2025

View reviewed changes

dgoodwin reviewed Nov 5, 2025

View reviewed changes

TRT-2254: extract Operator Progressing / Degraded Counts and Timing #30449

Are you sure you want to change the base?

TRT-2254: extract Operator Progressing / Degraded Counts and Timing #30449

Conversation

jianlinliu commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Nov 3, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Nov 3, 2025

Uh oh!

openshift-trt bot commented Nov 3, 2025

Uh oh!

openshift-trt bot commented Nov 3, 2025

Uh oh!

jianlinliu commented Nov 4, 2025

Uh oh!

openshift-ci bot commented Nov 4, 2025

Uh oh!

openshift-ci-robot commented Nov 5, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jianlinliu commented Nov 5, 2025

Uh oh!

jianlinliu commented Nov 5, 2025

Uh oh!

jianlinliu commented Nov 5, 2025

Uh oh!

openshift-ci bot commented Nov 5, 2025

Uh oh!

jianlinliu commented Nov 5, 2025

Uh oh!

openshift-ci bot commented Nov 5, 2025

Uh oh!

jianlinliu commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neisw Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

neisw Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

dgoodwin Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

jianlinliu Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jianlinliu commented Nov 3, 2025 •

edited

Loading

openshift-ci-robot commented Nov 3, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 5, 2025 •

edited by openshift-ci bot

Loading

jianlinliu commented Nov 5, 2025 •

edited

Loading