enable additional tests #568

miabbott · 2021-06-22T15:57:40Z

We hit a situation where the downstream tests behind the RHT firewall caught an issue with the RHCOS compose because we aren't running the same suite of tests upstream. Notably the kola testiso tests.

+ kola testiso -S --qemu-native-4k --qemu-multipath --scenarios iso-install --output-dir tmp/kola-metal4k
Testing scenarios: [iso-install]
Successfully tested scenario pxe-install for 49.84.202106211326-0 on bios (metal)
Successfully tested scenario iso-install for 49.84.202106211326-0 on bios (metal)
Successfully tested scenario iso-offline-install for 49.84.202106211326-0 on bios (metal)
[Pipeline] }
Error: scenario iso-install: timed out after 10m0s
2021-06-21T14:25:48Z cli: scenario iso-install: timed out after 10m0s

In this case, the root cause was a missing patch to Ignition in the 4.9 builds.

In build-test-qemu.sh, there is a TODO about turning on additional tests but there is a want for multiple tiers and splitting them into pods.

os/ci/build-test-qemu.sh

Lines 26 to 28 in 38dd888

    
           # TODO: all tests in the future, but there are a lot 
        
           # and we want multiple tiers, and we need to split them 
        
           # into multiple pods and stuff.

Are we in a position to try turning on more tests now? Do we need to design how multiple tiers would work? Or are we waiting for a gangplank future?

The text was updated successfully, but these errors were encountered:

openshift-bot · 2021-09-20T20:43:46Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

miabbott · 2021-09-22T19:11:05Z

/lifecycle frozen

I think we want to eventually expand our test coverage here, so keeping this open

cgwalters · 2021-09-24T13:32:44Z

Let me try to sketch out something here:

First, I think we should support Prow jobs executed from this repo like:

/test e2e-aws-os: Just builds an updated ostree, and applies that on top of the existing bootimages and does a cluster install
/test e2e-aws-boot: Builds an updated AMI and ostree, and applies both of those as overrides to openshift-install
/test e2e-aws-os-upgrade: Updated ostree, but also does an OpenShift-level upgrade (same as e2e-upgrade jobs)

Once we have that, we should match FCOS and ship lockfiles in this repo that are updated via a bot doing CI and pushing. If the CI jobs the bot runs are via Prow, that immediately unlocks a whole lot of power. I think to start, we can then drop the current Prow periodic os promotion job because (like other OpenShift components) the ART builds should be reproducing exactly the same thing tested in Prow CI.

But then for example, what I think would work really well is for branches (e.g. release-4.7 we switch over to having the bot submit PRs instead of auto-pushing). Then we could do more sophisticated things like say "OK this is a kernel update, let's /test all-the-clouds) etc.

(And actually if we did #498 first, then I think we could probably unformly move to a PR workflow because the rate of churn in RHEL is much smaller than in Fedora, it's mostly just kubelet that constantly churns for the main/master branch)

cgwalters · 2021-09-24T14:43:55Z

the ART builds should be reproducing exactly the same thing tested in Prow CI.

And then to emphasize this, we'd only be running at most quick "sanity tests" behind the firewall, everything else would be visible and executed via Prow. One thing I don't quite know here is the state of Prow + s390x/ppc64le though. We may still need the kola tests run on an internal pipeline for those?

travier · 2021-09-30T16:32:12Z

Would be great to also have /test-kola-<aws|gcp|azure>.

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2021

openshift-ci bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable additional tests #568

enable additional tests #568

miabbott commented Jun 22, 2021

openshift-bot commented Sep 20, 2021

miabbott commented Sep 22, 2021

cgwalters commented Sep 24, 2021 •

edited

Loading

cgwalters commented Sep 24, 2021

travier commented Sep 30, 2021 •

edited

Loading

enable additional tests #568

enable additional tests #568

Comments

miabbott commented Jun 22, 2021

openshift-bot commented Sep 20, 2021

miabbott commented Sep 22, 2021

cgwalters commented Sep 24, 2021 • edited Loading

cgwalters commented Sep 24, 2021

travier commented Sep 30, 2021 • edited Loading

cgwalters commented Sep 24, 2021 •

edited

Loading

travier commented Sep 30, 2021 •

edited

Loading