Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-45071: fix: adding an exclude list for pathological events occurring on SNO #29347

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jeff-roche
Copy link
Contributor

We are adding a list of exceptions for the pathological events that are failing the SNO blocking jobs. This is meant as a temporary band-aid until we have time to look into the underlying DNS issues causing these events.

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 5, 2024
@openshift-ci-robot
Copy link

@jeff-roche: This pull request references Jira Issue OCPBUGS-45071, which is invalid:

  • expected the bug to target the "4.19.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

We are adding a list of exceptions for the pathological events that are failing the SNO blocking jobs. This is meant as a temporary band-aid until we have time to look into the underlying DNS issues causing these events.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jeff-roche
Copy link
Contributor Author

/cc @eggfoobar @jaypoulz @neisw

@openshift-ci openshift-ci bot requested review from deads2k and sjenning December 5, 2024 16:02
Copy link
Contributor

openshift-ci bot commented Dec 5, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jeff-roche
Once this PR has been reviewed and has the lgtm label, please assign stbenjam for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jeff-roche
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Dec 5, 2024
@openshift-ci-robot
Copy link

@jeff-roche: This pull request references Jira Issue OCPBUGS-45071, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Dec 5, 2024
Copy link

openshift-trt bot commented Dec 5, 2024

Job Failure Risk Analysis for sha: 3528906

Job Name Failure Risk
pull-ci-openshift-origin-master-okd-scos-e2e-aws-ovn IncompleteTests
Tests for this run (12) are below the historical average (2371): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@jeff-roche jeff-roche force-pushed the sno-pathological-events branch from 3528906 to 545ab25 Compare December 10, 2024 19:43
@jeff-roche jeff-roche force-pushed the sno-pathological-events branch from 545ab25 to 678dab7 Compare December 10, 2024 20:18
monitorapi.LocatorPodKey: regexp.MustCompile("catalogd-controller-manager"),
},
jira: "https://issues.redhat.com/browse/OCPBUGS-45071",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may have over corrected 😄 . Originally you were filtering events for [sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers. I don't think this does that anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path to testBackoffStartingFailedContainer is part of monitortests/node/legacynodemonitortests chain that starts with monitor Stop

But I believe you just moved the matchers under testframework/legacytestframeworkmonitortests. I don't think the filter gets applied to the code above (could be wrong)

Copy link
Contributor

openshift-ci bot commented Dec 11, 2024

@jeff-roche: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 678dab7 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-ovn-microshift 678dab7 link true /test e2e-aws-ovn-microshift
ci/prow/e2e-aws-ovn-microshift-serial 678dab7 link true /test e2e-aws-ovn-microshift-serial
ci/prow/e2e-agnostic-ovn-cmd 678dab7 link false /test e2e-agnostic-ovn-cmd
ci/prow/e2e-aws-ovn-kube-apiserver-rollout 678dab7 link false /test e2e-aws-ovn-kube-apiserver-rollout

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

openshift-trt bot commented Dec 11, 2024

Job Failure Risk Analysis for sha: 678dab7

Job Name Failure Risk
pull-ci-openshift-origin-master-okd-scos-e2e-aws-ovn IncompleteTests
Tests for this run (20) are below the historical average (1329): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants