-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle nested whiteouts #172
Comments
So...half baked thoughts. We can clearly detect this situation in ostree when we're generating the composefs image. I think probably what we need to do here is fall back to continuing to create that sub-portion of the filesystem tree as hardlinks outside of a composefs mount (which we already do today actually). Then, we set up a dynamic bind mount from there to that hardlinked subtree. I think that'd work... |
I don't think this is the only issue we're going to have, is it? I mean, you also can't store overlayfs xattrs in the composefs image. @amir73il Is there a way to "recurse" overlayfs dirs for this? I.e. have an overlayfs mount that looks like it contains files such as whiteouts and regular files with overlayfs xattrs set on them? |
I wonder actually if we should have libcomposefs error out if the source data specifies such files. Currently they will produce pretty broken resulting images. |
Generally, overlayfs mount doesn't accept an overlayfs as the upper/work dir. |
@amir73il These would be lower dirs though. Basically, suppose In this case, if a file in the container storage has e.g. a metacopy xattr set, we want the container overlayfs mount to "interpret" that, not the composefs overlayfs mount. Basically, we would want something like mkcomposefs converting the |
And same with whiteouts, we would like a way to store a file in a lower that overlayfs doesn't consider a whiteout, but looks like a whiteout in the resulting overlayfs mountpoint. |
Since ovl already has a mount option userxattr to replace trusted.overlay prefix with user.overlay and all the code used helpers to build the actual xattr names, easiest would would be to support option xattr_prefix="user.cfs" or something like that (could also be trusted.cfs). |
That sounds like its easy to implement, but it would mean we would have to use the new prefix for all the files in the lower, and that would add a hard requirement on the new kernel with this support to be able to mount such a file. I'll give it some thought.
Ideally I would like to avoid checking for a new xattr on all files, as that would slow down everything. I'm thinking we can use an xattr for regular whiteout files, which if sets means that we don't actually treat them as whiteouts. |
On Mon, Aug 7, 2023, 5:42 PM Alexander Larsson ***@***.***> wrote:
Since ovl already has a mount option userxattr to replace trusted.overlay
prefix with user.overlay and all the code used helpers to build the actual
xattr names, easiest would would be to support option
xattr_prefix="user.cfs" or something like that (could also be trusted.cfs).
That sounds like its easy to implement, but it would mean we would have to
use the new prefix for all the files in the lower, and that would add a
hard requirement on the new kernel with this support to be able to mount
such a file. I'll give it some thought.
Regarding whiteout, simplest would be to support xattr overlay.whiteout.
no need to support this for creating whiteouts, only for lookup in lower
layers, so should be pretty simple. (I think )
Ideally I would like to avoid checking for a new xattr on all files, as
that would slow down everything. I'm thinking we can use an xattr for
regular whiteout files, which if sets means that we don't actually treat
them as whiteouts.
Maybe. Not sure how you plan to do this escaping in a nested way. When you
have an idea that is extendable beyond a single nesting you can share the
design.
… —
Reply to this email directly, view it on GitHub
<#172 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADOTJBRZ5D3FVJPD6PRZELXUEEHPANCNFSM6AAAAAA2TEVTNI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This issue also relates a bit to containers/storage#1608 in that ideally we support fully general transformations between what is interpreted by the Linux kernel (e.g. the
This is short for
I'm uncertain whether it makes sense to leap from not supporting this to a fully general nesting. I think this problem is a bit like nested KVM - the driving use cases for one level of nesting became big enough that it really made sense to productize. But having three levels (or more) is far more obscure, enough that I don't know if it's used for anything real. That said, wouldn't general nesting be implementable by having overlayfs detect the case where it's reading from an overlayfs lower and process "one level" of Anyways all this aside...hmm...so I think my original half baked thought of "un-nesting" would likely work except that we'd need to synthesize a local cfs image. In theory we could pre-compute the digests of these nested cfs image and include a signature covering them in the main cfs image. Or we can just rely on chain-of-trust from the main rootfs to code synthesizing the nested cfs images. |
Some initial work on nesting are here: https://github.com/alexlarsson/linux/tree/ovl-nesting |
I'm not sure exactly how this applies to composefs. There are two files, the file on the erofs image (
Nesting escapes is easy. The first layer will convert |
See ostreedev/ostree#2712
Basically...we're now shipping support in ostree for embedding whiteouts. This allows c/image (podman etc.) to be directly pointed at this alternative root in a read-only fashion. Container images shipped this way are "lifecycle bound" with the host (and gain benefits of dedup actually and the efficient ostree on-the-wire deltas).
But this only works because ostree itself doesn't use overlayfs (ostree actually predates overlayfs).
In a composefs future, because overlayfs doesn't nest, we're going to need to figure out how to handle this fun special case.
In a unified storage world things are inherently better here, but hard requring that would actually be an "API break".
I guess another way to say this is that everyone turning on the ostree composefs support is just going to break if they have nested containers today.
The text was updated successfully, but these errors were encountered: