Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable additional tests #568

Open
miabbott opened this issue Jun 22, 2021 · 5 comments
Open

enable additional tests #568

miabbott opened this issue Jun 22, 2021 · 5 comments
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@miabbott
Copy link
Member

We hit a situation where the downstream tests behind the RHT firewall caught an issue with the RHCOS compose because we aren't running the same suite of tests upstream. Notably the kola testiso tests.

+ kola testiso -S --qemu-native-4k --qemu-multipath --scenarios iso-install --output-dir tmp/kola-metal4k
Testing scenarios: [iso-install]
Successfully tested scenario pxe-install for 49.84.202106211326-0 on bios (metal)
Successfully tested scenario iso-install for 49.84.202106211326-0 on bios (metal)
Successfully tested scenario iso-offline-install for 49.84.202106211326-0 on bios (metal)
[Pipeline] }
Error: scenario iso-install: timed out after 10m0s
2021-06-21T14:25:48Z cli: scenario iso-install: timed out after 10m0s

In this case, the root cause was a missing patch to Ignition in the 4.9 builds.

In build-test-qemu.sh, there is a TODO about turning on additional tests but there is a want for multiple tiers and splitting them into pods.

os/ci/build-test-qemu.sh

Lines 26 to 28 in 38dd888

# TODO: all tests in the future, but there are a lot
# and we want multiple tiers, and we need to split them
# into multiple pods and stuff.

Are we in a position to try turning on more tests now? Do we need to design how multiple tiers would work? Or are we waiting for a gangplank future?

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2021
@miabbott
Copy link
Member Author

/lifecycle frozen

I think we want to eventually expand our test coverage here, so keeping this open

@openshift-ci openshift-ci bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 22, 2021
@cgwalters
Copy link
Member

cgwalters commented Sep 24, 2021

Let me try to sketch out something here:

First, I think we should support Prow jobs executed from this repo like:

  • /test e2e-aws-os: Just builds an updated ostree, and applies that on top of the existing bootimages and does a cluster install
  • /test e2e-aws-boot: Builds an updated AMI and ostree, and applies both of those as overrides to openshift-install
  • /test e2e-aws-os-upgrade: Updated ostree, but also does an OpenShift-level upgrade (same as e2e-upgrade jobs)

Once we have that, we should match FCOS and ship lockfiles in this repo that are updated via a bot doing CI and pushing. If the CI jobs the bot runs are via Prow, that immediately unlocks a whole lot of power. I think to start, we can then drop the current Prow periodic os promotion job because (like other OpenShift components) the ART builds should be reproducing exactly the same thing tested in Prow CI.

But then for example, what I think would work really well is for branches (e.g. release-4.7 we switch over to having the bot submit PRs instead of auto-pushing). Then we could do more sophisticated things like say "OK this is a kernel update, let's /test all-the-clouds) etc.

(And actually if we did #498 first, then I think we could probably unformly move to a PR workflow because the rate of churn in RHEL is much smaller than in Fedora, it's mostly just kubelet that constantly churns for the main/master branch)

@cgwalters
Copy link
Member

the ART builds should be reproducing exactly the same thing tested in Prow CI.

And then to emphasize this, we'd only be running at most quick "sanity tests" behind the firewall, everything else would be visible and executed via Prow. One thing I don't quite know here is the state of Prow + s390x/ppc64le though. We may still need the kola tests run on an internal pipeline for those?

@travier
Copy link
Member

travier commented Sep 30, 2021

Would be great to also have /test-kola-<aws|gcp|azure>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

4 participants