-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MGMT-17771: Adds enhancement for FIPS with multiple RHEL installer versions #6290
base: master
Are you sure you want to change the base?
Changes from all commits
6366631
4be0f87
9a69ecd
e87cd02
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,244 @@ | ||
--- | ||
title: fips-with-multiple-rhel-versions | ||
authors: | ||
- "@carbonin" | ||
creation-date: 2024-05-07 | ||
last-updated: 2024-05-20 | ||
--- | ||
|
||
# Support FIPS for installers built for different RHEL releases | ||
|
||
## Summary | ||
|
||
In order for an OpenShift cluster to be considered FIPS compliant the installer | ||
must be run on a system with FIPS mode enabled and with FIPS compliant openssl | ||
libraries installed. This means using a dynamically linked `openshift-install` | ||
binary against the openssl libraries present on our container image. Today this | ||
is not a problem because all `openshift-install` binaries in use have been | ||
expecting to link to RHEL 8 based openssl libraries, but now OpenShift 4.16 will | ||
ship an installer that requires RHEL 9 libraries. | ||
|
||
This will require assisted-service to maintain a way to run the | ||
`openshift-install` binary in a compatible environment for multiple openssl | ||
versions. Specifically FIPS-enabled installs for pre-4.16 releases will need to | ||
be run on an el8 image and 4.16 and later releases will need to be run on an | ||
el9 image (regardless of FIPS). | ||
|
||
## Motivation | ||
|
||
FIPS compliance is important for our customers and assisted-service should be | ||
able to install FIPS compliant clusters. | ||
|
||
### Goals | ||
|
||
- Allow for a single installation of assisted-service to install FIPS-compliant | ||
clusters using installer binaries built against RHEL 8 or RHEL 9 | ||
|
||
- Allow for FIPS compliant clusters to be installed from the SaaS offering or | ||
the on-prem offering | ||
|
||
### Non-Goals | ||
|
||
- Changing cluster install interfaces to accommodate new FIPS requirements | ||
should be avoided | ||
|
||
- Dynamically determining a given release's RHEL version. Assisted service will | ||
track the minimum version for using el9 and if a version can't be determined | ||
for some reason (FCOS may not have the same versioning scheme) el9 will be | ||
the default. | ||
|
||
## Proposal | ||
|
||
Two additional containers will run alongside assisted-service in the same pod. | ||
These "installer-runner" containers will expose a HTTP API local to the pod | ||
using a unix socket. assisted-service can then choose which API to contact to | ||
run an installer binary for a specified release to generate the manifests | ||
required for a particular install. These manifests will then be uploaded to | ||
whatever storage is in use for this deployment (local for on-prem, or s3 for | ||
SaaS) and assisted-service will take over as usual from there. | ||
|
||
### User Stories | ||
|
||
#### Story 1 | ||
|
||
As a Cluster Creator, I want to install FIPS compliant clusters for any supported OpenShift version | ||
|
||
### Implementation Details/Notes/Constraints | ||
|
||
#### New Images | ||
|
||
Two new container images will need to be built, and published for every release | ||
footprint we support. These images will be created based on existing | ||
assisted-service code, but could be split into their own independent projects | ||
later. | ||
|
||
### Risks and Mitigations | ||
|
||
Shipping a new image is a non-trivial process. This may take more time than we | ||
have to set up. We could likely get away with using the existing | ||
assisted-service image with a different entrypoint for one of the runner images, | ||
but that still requires us to publish a new one for the architecture | ||
assisted-service will not be using. | ||
|
||
## Design Details [optional] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This service closely matches what we need to implement. It could almost be borrowed as-is, mostly modifying what action it takes when receiving a request. Its purpose is to receive events from ansible-runner on a local unix socket shared between the containers. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup, looks about right to me, thanks for the reference |
||
|
||
- A new `installer-runner` service will be created, written in go. | ||
- The installer-runner will be compiled twice: once in a RHEL 8 builder image, | ||
and once in a RHEL 9 builder image, with each resulting binary being placed | ||
into a RHEL base image of corresponding version. | ||
|
||
|
||
The new runner containers will expose a HTTP server using a unix socket. | ||
assisted-service will POST to one of these servers when it needs manifests generated. | ||
The runner container will respond with any error that occurred while generating | ||
the manifests or with success in which case assisted-service will assume the | ||
manifests were created and uploaded successfully. | ||
|
||
### API | ||
|
||
The new services would effectively wrap the existing `InstallConfigGenerator` | ||
interface. | ||
|
||
API call input: | ||
- common.Cluster json | ||
- install config | ||
- release image | ||
|
||
API call output: | ||
- Appropriate HTTP response | ||
- Error message if the call was not successful | ||
|
||
### Installer Cache | ||
|
||
The installer cache directory will be shared (as it's currently on the PV), but | ||
the installers used by the two runners will never overlap. | ||
|
||
### Packages | ||
|
||
The installer runners will be built with the required packages to run the | ||
installer in FIPS compliance mode. | ||
|
||
### Open Questions | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want the new microservice to have its own repo? That may be a good end-state for the sake of isolated testing and such. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I briefly called this out in I don't think it's an open question though. For this pass I'm going to create the service in |
||
|
||
1. What does the API look like for the runner containers? What data should be | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A simple start would be to have a shared volume among all three containers.
The shared storage could be ephemeral; it just needs to be persistent for the life of the pod. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How long do we expect the openshift-install binary to take? It should be quick enough that we could keep an http request open, and write the response to the client when it's done, right? No need for an asynchronous task API? Maybe the exception would be if the installer binary isn't already in the local cache. Should assisted-service continue managing one cache that gets mounted into each installer-runner container? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the cache will be managed by the installer runner container and this container will also upload the resulting artifacts to the required location (either the shared data volume or s3). I don't expect the operation to take too long. I think we could to a single http request for this. Even if we need to tweak the timeout parameters I'd rather that than build an async system. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also regarding the storage side of this. The cache and the place where the generated files are stored after the installer is run is already a volume (specifically a PV) and this is what I intended to use for the shared storage, so I think that's a non-issue. My question here was what exactly will we need to pass over the API call. |
||
passed in an API call and what should be configured in the container | ||
environment? | ||
2. What specific packages are required for these new images? | ||
|
||
### UI Impact | ||
|
||
No impact | ||
|
||
### Test Plan | ||
|
||
FIPS and regular installs should be tested for all supported OpenShift versions. | ||
Since this should be mostly transparent to the user, regression testing in | ||
addition to testing 4.16 with and without FIPS should be sufficient. | ||
|
||
## Drawbacks | ||
|
||
- This is a complicated change in architecture something simpler might be more | ||
desirable. | ||
|
||
- Creating two additional containers in a pod makes the assisted service more | ||
expensive to scale. | ||
|
||
- Creating, maintaining, and releasing additional images is a non-trivial amount | ||
of additional work. | ||
|
||
## Alternatives | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we're saying that <4.16 is not fips complaint anyhow, can we say that for those releases we gonna use the statically linked one and for >=4.16, dynamically linked with rhel9? Can we use single existing container then? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we're saying that. OpenShift can be installed in FIPS mode today, and that's been the case for a while. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in ACM/SaaS world? or in general? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In general and in ACM IIUC There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sources for "OpenShift can be installed in FIPS mode today" include 4.12 docs for FIPS installation. And ROSA FedRAMP docs:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have you considered installing the RHEL 8 OpenSSL 1.1 into the RHEL 9 container alongside OpenSSL 3.0? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I considered this, but I'm a bit worried about landing this in time and this approach feels like it has more unknown potential pitfalls than the approach described here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did bring up the alternative, which should probably be added to this doc, of extracting an entire rhel 8 userspace within a rhel 9 userspace, and using chroot to run an installer that needs rhel 8. But that brings other complexities around image build and management that probably aren't worthwhile compared to just running two copies of a small process in two different container images. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't need chroot. We can use ldpreload env to point to the relevant path when installing <4.16 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible that this could be as simple as installing a separate ssl version somewhere else and setting some env vars when running the installer? I can give that a try if there's a chance it might work. Will ssl be the only library I need to override in this way? Everything else will be backward compatible? Will this actually be FIPS compliant? Are there any other requirements we would also need to satisfy? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Running a rhel8 container (our current released image) the files provided by the older openssl rpm are these:
I suppose I could use a multi-stage build to copy those files into a special directory on the final (rhel9) image then try use I'd need a fully FIPS enabled environment to test if it's working though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure you can install different versions of the same RPM on the same system. My idea was to use a multi-stage container build, install the ssl rpm on an el8 image then copy the files directly into a directory in later stage. Then when we know we need to run the el8 binary we'd set whatever envs we need to allow it to find the old so. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hopefully I'll have an environment to test this in this morning and then if it works I'll run it by the FIPS experts to figure out if this process would be considered compliant. |
||
|
||
### Use Jobs | ||
|
||
Hive is investigating using the container image from the release payload to run | ||
the installer as a Job. | ||
|
||
- This wouldn't work for the podman deployment which isn't directly productized | ||
or supported, but is still a way we encourage people to try out the service. | ||
This could be overcome by retaining a way to run the installer on the | ||
service container, but then both methods need to be tested and maintained. | ||
|
||
- This wouldn't work for Agent Based Installer as ABI runs the services using | ||
podman. This could also be overcome by retaining a way to run the installer | ||
local to the service as the image version run by ABI will always match the | ||
target cluster, but again both methods of running the installer would need to | ||
be maintained indefinitely. | ||
|
||
- It's unclear how many jobs we would end up running concurrently. It would be | ||
difficult to find out from the SaaS how many installer processes are being run | ||
concurrently (maybe we should create a metric for this), but the telco scale | ||
team regularly runs several hundred concurrently maxing out at over three | ||
thousand in a few hours. Unless we're cleaning up the jobs rather aggressively | ||
I don't think it would be good to create this many. | ||
|
||
- Multiple jobs would need to be run on a single assets directory. This seems | ||
prohibitively complex compared to the proposed solution. During a single | ||
install the following installer commands are used: | ||
- `openshift-baremetal-install create manifests` | ||
- `openshift-baremetal-install create single-node-ignition-config` or | ||
`openshift-baremetal-install create ignition-configs` (depending on HA mode) | ||
|
||
### Run the matching installer on the assisted-service container | ||
|
||
Clusters that have installers that already match the assisted service container's | ||
architecture could be handled by the assisted-service container as we do today. | ||
This would require one less image and container per pod, but having the same | ||
process for every cluster install would be easier to understand and maintain. | ||
|
||
### Use RPC over HTTP | ||
|
||
[Go's RPC](https://pkg.go.dev/net/[email protected]) could be used instead of a direct | ||
HTTP server (RPC can be hosted over HTTP, but that's not what is being addressed | ||
here). RPC would make this a simpler change as the code for generating the | ||
manifests is already contained in a single package, but RPC would be a strange | ||
choice if we were to move the handling into a truly separate service in the | ||
future. | ||
|
||
### Install multiple libraries on the same image | ||
|
||
It may be possible to install both versions of the shared libraries required by | ||
the installers (libcrypto and libssl?) for FIPS compliance on a single image. | ||
This would require much less change and should be significantly quicker to | ||
implement, but it's not clear if these would be possible or supportable. | ||
This could be achieved by any of the following methods: | ||
|
||
1. Create a separate userspace for el8 libraries and chroot when those libraries | ||
are required. | ||
- This seems a bit complicated and it will likely make our image quite a bit | ||
larger than it already is (~1.3G). | ||
Comment on lines
+205
to
+208
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried this out and was able to get both installers to run in a container on a FIPS-enabled RHEL9 host using I still need to get assurance that this would actually result in a FIPS-compliant cluster, but if it does it seems like the most promising option to me. I think this is worth a POC. @mhrivnak @romfreiman What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would this approach cause a problem for image scanning and tracking of packages for CVEs and such? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Asked about this in the FIPS-related internal channels and it turns out nothing like this would be considered FIPS-compliant so this disqualifies any approach that doesn't run the installer on a base container image that it is expecting to run on. I'll add this to the enhancement text. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. chroot :) |
||
2. Install both versions of the required packages simultaneously. | ||
- Not sure if this is possible given that the packages share a name and are | ||
only different in version. | ||
3. Use multi-layer container builds to copy the libraries from an el8 image to a | ||
directory on the el9 image and use `LD_PRELOAD` or manipulate `LD_LIBRARY_PATH` | ||
to point the el8 installer binaries to the correct libraries. | ||
|
||
The approach using chroot worked, but FIPS SMEs said that the container base | ||
image *must* match the installer for the resulting cluster to be considered | ||
FIPS-compliant so none of these multi-library options are valid. | ||
|
||
### Publish multiple assisted-service images | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @romfreiman does this look like what you were talking about? |
||
|
||
It's likely that a user installing in FIPS mode will only be interested in | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we think this is true? Everyone starts with a single version. But existing clusters and their use cases lag behind new clusters and new use cases. I would expect that most users end up with a mix of openshift versions over time. Keep in mind that users need the ability to re-provision their clusters at any time, for recovery from all kind of unexpected events. And a growing number of customers depend on regular "cluster as a service" provisioning. Telling such a user that they can only (re-)provision a subset of openshift versions would generally be a big disadvantage. Even a large-scale user with relatively uniform versions is not going to upgrade all of their clusters at once. It would not be ideal to put them in a position where they can only provision clusters for the newest openshift versions in their fleet. Is there reason to think that a FIPS user would have different patterns and expectations? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right that this is an assumption on my part, but my experience with even moderately conservative customers is that they have a single version of OCP that is vetted and allowed in the org (at least in production). My guess was that any user interested in full FIPS compliance would be even more strict than that, but I have no actual users to back any of this up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As for upgrade and reprovisioning. I do consider this option a temporary solution that buys us time to properly think through and vet something that would work more generically. Ideally in the version following the one this alternative would be implemented in we'd implement the main body of this enhancement (or something similar). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They start with a single vetted version, then that progresses over time, but upgrades lag. I think it's still better to implement this enhancement as described, which would fully preserve the existing use cases and user experience, and then optimize it later as needed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mhrivnak all good. can we find such customer now? we'll have it as an rfe and improve in the next release. I dont think we should restructure the whole service 2 weeks before the release. |
||
installing a single OCP version at a time. This means that a given version of | ||
assisted will need to still support both el8 and el9 FIPS installs, but a single | ||
deployment of assisted would not. | ||
|
||
To handle this the assisted-service image would be built twice; once based on | ||
el8 and again based on el9. Both images would be released and the operator would | ||
choose which to deploy based on configuration (likely an annotation since a more | ||
robust solution would be preferred in the future). | ||
|
||
For example, in the case that a user knew they wanted to deploy OCP 4.14 from a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This assumes that the user who owns the configuration of the management cluster itself is the same user who is provisioning clusters. But often that role is separate. The "Cluster Creator" would need to work with the cluster admin to re-configure assisted-installer to switch it from el8 mode to el9 mode. Then what if they need to switch back? This could become a real hassle. It would help to have automation that recognizes when the config doesn't work for a desired install operation, but resolving that could be a pain. Even for cases where it is the same person doing both personas, it's inconvenient. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is already something that the application admin is choosing to a degree when they set the available OS images, so I don't think it's too much of a stretch for them to also communicate which OCP versions will be in use. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair point. So then the time will come when the cluster creator asks the admin to add a new version that requires removal of all other versions. Then they'll have to negotiate an upgrade plan with all stakeholders to get the whole fleet safely over the hump, etc etc. Doesn't that seem like a pain? If we can avoid creating that burden for our customers, we'll all be better-off. |
||
FIPS-enabled hub cluster they would need to indicate to the operator that the | ||
el8-based assisted-installer should be deployed. Assisted service could also | ||
check that the OCP version, current base OS, and FIPS-mode were all aligned | ||
before attempting to run the installer. | ||
|
||
To avoid issues when installing in a non-FIPS environment the assisted-service | ||
could also move to default to the statically linked installer binary for OCP | ||
4.16 and above, but this doesn't change anything for earlier releases. | ||
|
||
This would be something that could be implemented more quickly with less risk | ||
while also leaving open the possibility of a more complex solution to the general | ||
problem in a future release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does SaaS requires also apps SRE to chime in and install FIPS compliant clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have an open task for app-sre https://issues.redhat.com/browse/SDE-3692
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, also this comment was more about the solution allowing for this possibility not necessarily that, once implmented, we will be able to install FIPS from the SaaS. The main goal is to make FIPS work for both rhel8 and rhel9 installer releases.