Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

layer: clarify attributes for implied directories #970

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

neersighted
Copy link

The image specification currently does not describe how conformant implementations should handle the case of a layer that contains "implied directories" -- entries that imply parent directories exist through their path, without those parent directories having their own entires in the archive.

As such, this behavior is currently implementation-defined and may not be consistent, even in the same implementation (e.g. moby/moby#44106).

To resolve this, we explicitly define what behavior is expected in this situation, selecting 'neutral' attributes (e.g. using the container USER's UID/GID, and using 0755 for mode, as derived from the default umask(2) of 0022).

It has been observed that it may be desirable to allow (MAY) conformant implementations to instead elect to not handle this case, and to fail with a useful error message instead. I lack the insight into the wider ecosystem to consider if this is a useful behavior, but it provides another aspect to consider when discussing this change.

@neersighted
Copy link
Author

cc @vasiliy-ul @thaJeztah

@sudo-bmitch
Copy link
Contributor

Does this mean / would be chowned to the user, with 755 permissions? Are there implementations that do this already?

@neersighted
Copy link
Author

/ cannot be present as all paths in a tar archive are relative. Implementations should handle that already, though I would not be opposed to an explicit Note: that calls that out.

@neersighted
Copy link
Author

Hmm, I suppose I misspoke, as . will serve the same purpose even if it's not a literal /. Moby has always ignored the root directory when creating parent directories, and the new implementation that is driving this PR makes it a bit more explicit:

https://github.com/moby/moby/blob/b9921a5560e22a2744bd7f51cca0475bbe20162e/pkg/archive/archive.go#L1154-L1173

@sudo-bmitch
Copy link
Contributor

When I first saw this, I was thinking of the COPY --link feature in buildkit, where you don't necessarily know the parent folder state, but inferring it is probably a worse option (e.g. creating a user folder in /home shouldn't change the permission of /home itself).

Since we are saying layer creators SHOULD include the parent folders, and this is just an exception case, then this LGTM. I've added it to tomorrow's meeting agenda to get some more eyes on it.

@sajayantony
Copy link
Member

Capturing from the call -

  • Create an issue on runtime spec to get inputs from runtime authors
  • Consider how layers shared across behaves with this proposal of implicit directories.
    @cpuguy83 -

Possibly need to look at the parent dir to get perms
Maybe with consideration for SOURCE_DATE_EPOCH or other such configurability.

@sudo-bmitch
Copy link
Contributor

To rephrase the shared layer concern: layers aren't attached to a single image, they can be included in multiple images. So when runtimes share layers, the filesystem permissions and ownership can vary based on the order the images are pulled.

The fix is to pick settings not based on the image, either static values (root), or imply values based on the first entry in the tar file that creates the implied directories.

@cyphar
Copy link
Member

cyphar commented Nov 3, 2022

umoci uses 0755 and the root UIDs, btw. We also set the ctime and mtime to the Unix epoch (to have at least some hope of producing a reproducible container) but I think no-one else does that. I don't think we should say that implementations shouldn't modify / -- a lot of container images don't have a / (or . if you prefer) entry and so you can end up with irreproducible container images if you don't allow container image tools to set / (or any other implied directory) to some fixed value.

The image specification currently does not describe how conformant
implementations should handle the case of a layer that contains "implied
directories" -- entries that imply parent directories exist through
their path, without those parent directories having their own entires in
the archive.

As such, this behavior is currently implementation-defined and may not
be consistent, even in the same implementation (e.g. moby/moby#44106).

To resolve this, we explicitly define what behavior is expected in this
situation, selecting 'neutral' attributes (e.g. using the container
`USER`'s UID/GID, and using `0755` for mode, as derived from the default
`umask(2)` of 0022).

Signed-off-by: Bjorn Neergaard <[email protected]>
@neersighted
Copy link
Author

neersighted commented Nov 8, 2022

I've spent a bit of time experimenting over the last few days, and come to some conclusions:

  • As nice as supporting SOURCE_DATE_EPOCH sounds, there's not really a way to interpret it that makes sense given the process model of existing implementations; it also has the potential to not be deterministic across runtime instance restarts.
  • Containerd currently recurses looking to copy metadata back down the tree when it is missing; this is rather complex, and hard to specify, and doesn't seem to offer many advantages when we are trying to define an edge case image authors should avoid.
  • Picking the most neutral metadata possible (e.g. root:root, with the Unix epoch time) seems to be the most reasonable across the board.

I've pushed up a new version that adjusts things to root:root and the Unix epoch; however I still have not considered atime and ctime as they are not mentioned elsewhere in the existing spec. Does anyone think we should consider those as well?

sudo-bmitch
sudo-bmitch previously approved these changes Nov 23, 2022
Copy link
Contributor

@sudo-bmitch sudo-bmitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We've had an open issue with runtime-spec for their review for 2 weeks, and we don't define atime or ctime anywhere else, so I'm comfortable merging as is and iterating if needed.


As the tar format describes directory hierarchies using a flat datastructure, it is possible to have so-called "implied directories" where not all parent directories implied by an entries' path in the archive have their own entry.

When applying a layer, implementations MUST create any parent directories implied by an entries' path, even if it is otherwise absent from the archive. Attributes of the created parent directories MUST be set as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this first MUST is uncontroversial, but the second MUST is potentially sticky and perhaps needs to be SHOULD?

To state that another way, are we confident that all existing implementations are currently complying with this second MUST? (I'm reasonably confident they are complying with the first one, because it's kind of unavoidable.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly ping @neersighted 🙇

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the attribute list needs to be a SHOULD, if only because requiring empty xattrs can run into issues (for instance, SELinux labels and NFSv4 xattrs both have weird behaviours in this respect).

@sparr
Copy link
Contributor

sparr commented Jun 26, 2023

I'd like to hear more about the motivation behind these specific recommendations/requirements. A user might naively expect parent directories to be created with the same ownership and permissions as the file being created, and I can't immediately think of an obvious reason to not do that. A different expectation might be that the ownership percolates down from the parent of the first implied directory.

i.e. If you're creating A/B/C/D and A/B already exists then the ownership and permissions of D will be specified but C is the focus of this PR. This proposal dictates specific attributes for C. I might also suggest that C's attributes could be copied from B or from D.

* `uid` is set to the `0`
* `gid` is set to the `0`
* `mode` is set to `0755`
* `xattrs` are empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot be MUST because some xattrs are auto-set by the kernel or need to be set in order for the system to work properly (SELinux labels for instance). I agree with @tianon that this should be a SHOULD.

@cyphar
Copy link
Member

cyphar commented Aug 2, 2023

however I still have not considered atime and ctime as they are not mentioned elsewhere in the existing spec. Does anyone think we should consider those as well?

atime is not practical to set on Linux (there's no interface to change it directly, you can only change the system time -- possible in a time namespace -- and hope when you touch the file it matches the time you wanted) so we should ignore it. I think ctime should be set to the same as mtime.

(I wrote this draft message last year and it got lost in my tabs. I only just got around to posting it. Sorry...)

@cyphar
Copy link
Member

cyphar commented Aug 2, 2023

@sparr At the moment the behaviour is completely unspecified so different image-spec implementations can produce different results. If you feel that your suggested algorithm makes more sense, then you should still want some text in the spec to describe the required behaviour.

As for your suggestion, there are a few issues that would need to be considered:

  • What about xattrs and other file metadata? Should only the owner and mode be copied? What about file attributes? ACLs?
  • What if A/B/C/D is a file, with the parent being an implied directory? Using the file mode doesn't make sense, so we have to pick a default for this case anyway. The trickle-down suggestion doesn't have this problem, but it could cause security issues if the top directory had the sticky bit set and you ended up setting the sticky bit for every implied directory.
  • Looking at precedent, GNU tar creates implied directories as the current user and sets the mode of directories to 0755 (though I suspect this is affected by umask). In other words, for root users the behaviour matches this proposal (and the way umoci does it, btw).

My personal view is that there needs to just be a simple fallback behaviour for this edge-case -- the suggestions of copying attributes up or down just adds more complexity for something that basically no image is going to run into. No image generation tool that I'm aware of generates tar archives with implied directories (arguably it would be perfectly correct for an image-spec tool today to error out if you had an archive with implied directories).

In addition, a new MUST change like this should take into account what existing tools do -- AFAIK most tools either handle this in the "naive" way (without noticing this might be an issue, they just make implied directories) or they use similar defaults to the ones described in this PR.

@tianon
Copy link
Member

tianon commented Jun 4, 2024

Friendly ping @neersighted -- I think changing that one MUST to SHOULD is the only feedback preventing this from moving 👀 ❤️

@sudo-bmitch sudo-bmitch dismissed their stale review July 18, 2024 14:11

Dismissing review pending xattrs resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarification if parent directories can be omitted in layer tar
6 participants