Skip to content

Commit

Permalink
Add more details to developer preview and add the example bash script
Browse files Browse the repository at this point in the history
  • Loading branch information
kannon92 committed Aug 1, 2024
1 parent 957070f commit 5ce0e41
Showing 1 changed file with 79 additions and 79 deletions.
158 changes: 79 additions & 79 deletions enhancements/kubelet/split-filesystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,6 @@ see-also:

# Split Filesystem

## Open Questions

- Installer does not support creating openshift clusters with multiple disk.
Does this feature have value without users being able to configure their cluster to have a separate filesystem?

- Do we need a drop in configuration for container storage?
- https://github.com/containers/storage/pull/1885

- How does one delete all images and containers once the container runtime config is changed?
- crictl on all images and containers on each node?

- What is the best form of telemetry to show that a customer is using this feature?

- How would this feature work with layering?

- Day 2 operations for adding disks to openshift

## Summary

Upstream Kubernetes has released [KEP-4191](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4191-split-image-filesystem/README.md).
Expand All @@ -51,9 +34,15 @@ See [KEP Motivation](https://github.com/kubernetes/enhancements/tree/master/keps

### User Stories

#### Story 1

As an Openshift admin, I want to store images in a separate filesystem from ephemeral storage and the writeable layer.
The images can be on a read-only filesystem while ephemeral storage and the writeable layers can live on a writeable filesystem.

#### Story 2

As an Openshift admin, I want to store images on a dedicated disk that multiple nodes can share.

### Goals

- Enable ability to split filesystem in openshift
Expand All @@ -77,11 +66,24 @@ to a lack of interest from customer requests.

In the developer preview, a user can run the following steps to enable this feature.

We will automate these steps for tech preview.
We will further refine the user APIs and interfaces for tech preview.

#### Feature gate

User needs to set `KubeletSeparateDiskGC` feature gate in the kubelet config.
User can set the `KubeletSeparateDiskGC` feature gate in `CustomNoUpgrade.

```yaml
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
name: cluster
spec:
featureSet: "CustomNoUpgrade"
customNoUpgrade:
enabled:
- KubeletSeparateDiskGC
```
#### Storage Configuration
Expand Down Expand Up @@ -115,36 +117,32 @@ And then run `butane storage.bu -o storage.yaml

Applying storage.yaml will apply this machine config to your workers.

#### Labeling Filesystem
#### Prequisites for Filesystem

One could use the following systemd file to relabel the imagestore location.
User must have a disk partition labeled as IMAGE_STORE.

```
[Unit]
Description=Label ImageStore
After=crio-install.service
[Service]
Type=oneshot
ExecStart=rpm-ostree install \
-y \
--apply-live \
--allow-inactive \
policycoreutils-python-utils
ExecStart=semanage fcontext -a -e /var/lib/containers/storage /var/lib/images
ExecStart=restorecon -R -v /var/lib/images
[Install]
WantedBy=multi-user.target
```
#### Script

We will have a script called `split-filesystem.sh`.

This bash script will mount the filesystem using `IMAGE_STORE` label.
The script will check if the selinux labels are correct for `/var/lib/images`.
If the labels are correct, we will create a file called `.relabel_complete`.
If the labels are incorrect, folder will be relabled.

This script will be injected as a systemd file and referenced in a unit file similarly to how we do this for the kubelet auto sizing feature.

One can look at [kubelet-auto-sizing scruot](https://github.com/openshift/machine-config-operator/blob/master/templates/common/_base/files/kubelet-auto-sizing.yaml) for an example of the script and [kubelet-auto-sizing unit](https://github.com/openshift/machine-config-operator/blob/master/templates/common/_base/units/kubelet-auto-node-size.service.yaml) for an example of the unit file.

We will run this script after the `crio.service`.

#### Remove all old images

Since the image cache has changed locations, all the old images left over should be removed.

Simplest option is to remove the images on each node that this feature was enabled.

#### Checking if feature is enabled on a node.
#### Checking if feature is enabled on a node

One can use `crictl imagefsinfo` to see if the filesystem is split. This will show imageFilesystems and containerFilesystems.

Expand All @@ -166,24 +164,18 @@ Container storage is configured in openshift by adding this [file](https://githu
Container Runtime Config allows one to change the overlay size of storage. Other fields of this file are kept the same as the template.

### Feature Gate
We will add a feature gate to openshift/api.
Since we are enabling only for Dev Preview, everything will be configured via MachineConfigs or other OpenShift APIs already created.
```golang
FeatureGateKubeletSeparateDiskGC = newFeatureGate("KubeletSeparateDiskGC").
reportProblemsToJiraComponent("node").
contactPerson("kannon92").
productScope(kubernetes).
enableIn(configv1.DevPreviewNoUpgrade).
mustRegister()
```
In tech preview, we will add feature gates in openshift/api.
### Configuration of container storage
### API Changes
These API changes will be done in the tech preview stage.
```golang
type ContainerRuntimeConfiguration struct {
...
Expand All @@ -199,66 +191,74 @@ This feature will write the updated container storage file.

This will also trigger labeling of /var/lib/images.

### Risks and Mitigations
## Open Questions

- Installer does not support creating openshift clusters with multiple disk.

Kubernetes and Openshift do not really advertise the support of separate filesystems.
We also do not allow for most configuration of the container runtime. Changing configuration
in this area can break your system.
- Do we need a drop in configuration for container storage?
- https://github.com/containers/storage/pull/1885

To derisk this scenario, in tech preview, we will propose an API to configure imagestore.
- How does one delete all images and containers once the container runtime config is changed?
- crictl on all images and containers on each node?

- Day 2 operations for adding disks to openshift

### Risks and Mitigations

In tech preview, we will automate many of the steps to mitigate problems.

### Drawbacks

## Test Plan

TBD
We have upstream tests where we run the conformance tests of node-e2e with this feature enabled.

We could follow a similar idea.

## Graduation Criteria

### Dev Preview

- Ability to view if a user has configured this feature.
- Feature gates to enable this feature
- API change to streamline configuration of this feature.
- Systemd bash script and unit file added
- Manual steps for enabling via MachineConfigs
- Blog post walking through this feature.

### Dev Preview -> Tech Preview

Will update once we are ready to promote.
- Tech Preview will include the API changes.
- Telemetry will be included to verify if user is using this feature
- Installer support

### Tech Preview -> GA

Will update once we are ready to promote.

### Removing a deprecated feature

- Announce deprecation and support policy of the existing feature
- Deprecate the feature
NA

## Upgrade / Downgrade Strategy

TBD

We have a few major items to call out.

### Upgrade from feature off to feature on

Let's say scenario a does not have this feature enabled and CRI-O is not configured.

Let's say scenario b has this feature enabled.

If one wants to upgrade from scenario a to scenario b, cri-o should delete all images and repull.
If one wants to upgrade from scenario a to scenario b, cri-o will repull all images.
This is because the cache of the images will not be located in the same location and could cause some problems.

On a reboot of the node, all existing services will repulling their images.

It will be important to remove all the images and containers before using this feature.
On a reboot of the node, all existing services will repull their images.

### Upgrade from feature on to feature on

Upgrades where the feature enablement stays the same should have no impact.

### Downgrading from feature on to feature off

Images will be pulled to `graphroot` location on downgrade. All images in imagestore will no longer be tracked and are effectively orphaned.
The disk will be safe to prune or unmount in a manual step.

## Version Skew Strategy

The support for this feature was merged into CRI in 4.15. However, this feature is only supported for 4.18 and above.
Expand All @@ -267,16 +267,16 @@ This is due to an issue in container/storage around the imagestore implementatio

## Support Procedures

Document failure modes as this will be interesting to explain.
A common problem with changing container storage location is selinux permission denied errors.

## Alternatives
If a container is failing due to the following error:

Similar to the `Drawbacks` section the `Alternatives` section is used
to highlight and record other possible approaches to delivering the
value proposed by an enhancement, including especially information
about why the alternative was not selected.
```bash
error while loading shared libraries: /lib64/libc.so.6: cannot apply additional memory protection after relocation: Permission denied
```
One needs to check the labels for the imagestore and verify relabeling work. If not, a relabel is necessary.
## Infrastructure Needed [optional]
## Alternatives
Use this section if you need things from the project. Examples include a new
subproject, repos requested, github details, and/or testing infrastructure.
NA

0 comments on commit 5ce0e41

Please sign in to comment.