Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: generate "base rhel" container image, build OCP on top #498

Closed
cgwalters opened this issue Feb 8, 2021 · 14 comments
Closed

proposal: generate "base rhel" container image, build OCP on top #498

cgwalters opened this issue Feb 8, 2021 · 14 comments
Labels
jira lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@cgwalters
Copy link
Member

Now that rpm-ostree is close to supporting "live updates", one thing we could do is move crio/kubelet into a separate machine-os-kubelet container or so, and also move openvswitch as part of e.g. the SDN container.

But these would still be treated as "first class" bits because they'd still be underneath the readonly bind mount in /usr etc. The MCO would learn to pull down this machine-os-kubelet container and apply updates from it too; and we can generalize that to N container images with M RPMs inside (or...perhaps not RPMs at all).

Advantages:

The RHCOS bootimage is basically just RHEL, and this would greatly increase alignment with OKD since we'd use the same approach in both places.

On the bootstrap node, the crio/kubelet in use become exactly the same as the one shipped in cluster.

There wouldn't be anymore "CI -> shipping" gap for kubelet - when a PR merges to that repo it'd get rebuilt and shipped the same way all other containers do and not versioned with RHCOS at all.

Note in this we wouldn't be breaking at all the concept that the cluster owns and manages OS updates; we'd still be testing the OS and kubelet and cluster components all together as a unit in the end. The goal here is just internally split things up more so we can improve the process for CI and building; for example, the RHCOS version number would (mostly) just be a RHEL version number which would greatly increase clarity of how things work. We can be more agile with kubelet/crio etc.

@ashcrow
Copy link
Member

ashcrow commented Feb 8, 2021

I like the idea! It sounds like decoupling into a few classes of streams to make bootstrapping and CI testing easier to manager.

  1. boot strap and cluster (requirements to bootstrap and to run a cluster)
  2. general cluster (more general OS level)

One question is how would we tie these container images together? For example, if the machine-os-kubelet needed to bump and machine-os-content didn't change. Would they be combined by container tags or through the payload manifest at the higher level? etc..

@cgwalters
Copy link
Member Author

The machine-os-kubelet would just be another payload image, it would be built with a Dockerfile the same way as everything else in the cluster. The MCO would know how to pull it down and apply it. During cluster bootstrap, bootkube.sh already downloads the MCO container, so we'd extend that to call the MCO to also extract crio/kubelet. For upgrades the MCO would also extract and apply it the same way it does machine-os-content.

IOW the end goal here is that the lifecycle of this container is logically separate; either container can change independently without caring about the other, we just merge the result.

@cgwalters
Copy link
Member Author

cgwalters commented Feb 8, 2021

I think the best way to view this proposal is our workflow for "test RHCOS with new RHEL minor version". With this flow, we produce one machine-os-content:rhel-8X where X is e.g. 4 or 5 - and then we can test and ship that oscontainer across multiple OpenShift versions.

@ashcrow
Copy link
Member

ashcrow commented Feb 8, 2021

IOW the end goal here is that the lifecycle of this container is logically separate; either container can change independently without caring about the other, we just merge the result.

This is what I was looking for 👍

@vrutkovs
Copy link
Member

That would be extremely useful for OKD as we now have to build a full blown image just to ship a few RPMs.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2021
@LorbusChris
Copy link
Member

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2021
@travier
Copy link
Member

travier commented Jul 6, 2021

/label jira

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2021

@travier: The label(s) /label jira cannot be applied. These labels are supported: platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, px-approved, docs-approved, qe-approved, downstream-change-needed

In response to this:

/label jira

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 11, 2021
@travier
Copy link
Member

travier commented Oct 18, 2021

/remove-lifecycle stale
/lifecycle frozen

We're working toward that goal, just not there yet but the ostree-ext work might get us there.

@openshift-ci openshift-ci bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 18, 2021
@cgwalters
Copy link
Member Author

In the end this also kind of requires that we structure inputs to the base image to only come from RHEL for example, so that there's only one version number that matters.

@cgwalters
Copy link
Member Author

And a core problem with this is in some cases - specifically e.g. the live ISO, use cases that we have rely on kubelet existing there by default.

That said, it may be the case that we could try to do this at the core - i.e. generate one RHCOS 8.5 build, and then further specialize/derive that build for multiple OCP releases, and generate disk images out of those. If we could get away with only having redhat-release and dropping redhat-release-coreos that would be a huge help for sure. I think we'd just end up injecting the derived OCP version into the disk images or so?

@cgwalters cgwalters changed the title proposal: split crio/kubelet to separate container image proposal: generate "base rhel" container image, build OCP on top May 11, 2022
@cgwalters
Copy link
Member Author

I have a variant of this in #799 that differs in important technical ways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

7 participants