Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] Tolerate failures of HC installs in HyperShift rosa-cli wrapper #126

Open
smalleni opened this issue Mar 14, 2023 · 2 comments
Open
Assignees

Comments

@smalleni
Copy link

Currently, the workload is kicked only when all hosted clusters are created. However, we have seen that it's highly likely that one or more HC installs fails leading to the entire test failing and having to cleanup, and rerun.

It would be good to have a flag to consume toleration for failure like (10%) and as long as the number of Hosted Clusters that have not been created after a timeout is less than that, the script should proceed to creating the workload and cleaning up after that.

@smalleni
Copy link
Author

@morenod

@morenod morenod self-assigned this Mar 14, 2023
@venkataanil
Copy link
Contributor

We can enhance wrapper to have 2 ways to start e2e

  1. current way of when all the clusters moved to ready state
  2. Additionally make wrapper listen on a socket (or a file) and wait for user command like "start_e2e" to start e2e before above condition is met (i.e before all clusters are ready). In this way user can ask wrapper whenever he wants the e2e to get started for example, start e2e when he sees 50 clusters are ready during 80HC testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants