-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure files and links have missing parent dirs created. Support hard links. #1179
Ensure files and links have missing parent dirs created. Support hard links. #1179
Conversation
} | ||
|
||
linkDir := filepath.Dir(newname) | ||
return filepath.Join(linkDir, oldname), newname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is incorrect for hard links: their targets are relative to the root directory of the layer archive. For confirmation, an example of an image with such hard links is docker.io/library/busybox.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why busybox would be relevant here, since all our images must be based off of UBI...but if you look at busybox...it doesn't have an os-release
file, hence it's failure when ran against preflight.
~/busybox/etc
❯ ll
total 20
-rw-r--r--. 1 acornett acornett 306 May 18 2023 group
-rw-r--r--. 1 acornett acornett 114 May 18 2023 localtime
drwxr-xr-x. 1 acornett acornett 82 May 18 2023 network
-rw-r--r--. 1 acornett acornett 494 May 18 2023 nsswitch.conf
-rw-r--r--. 1 acornett acornett 340 May 18 2023 passwd
-rw-------. 1 acornett acornett 136 May 18 2023 shadow
~/busybox
❯ find . -name os-release
(no-output)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The busybox image is relevant in that it provides an example of how hard links are stored in tar archives and container image layers, and it was an image that I remembered included hard links in its layers. If it has to be UBI-based to matter, feel free to build this and examine the added layers:
FROM registry.access.redhat.com/ubi8
RUN echo hello > /usr/local/hello.txt
RUN ln /usr/local/hello.txt /usr/local/hello2.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 @nalind, thanks. I misread my test tar archives - I only treated link origins with ..
as relative, and everything else as absolute (or, rather, relative to /). The inverse seems to be what I want (everything not explicitly absolute should be treated as a relative path).
Latest push should implement this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what was previously done (passing the value from the Linkname
field to os.Symlink()
) was correct for symbolic links. I've only seen "Linkname values are always relative to the root of the archive" with hard links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalind I believe this resolves correctly now for both symlinks and hardlinks. From what I see, an image that looks like this:
FROM registry.access.redhat.com/ubi8
RUN ln -v /etc/os-release /os-release
RUN ln -sf /etc/os-release /os-release-soft
Has no issue linking the symlinks and hard links. E.g.
[root@2656a86bb595 fs]# ls -l
total 24
lrwxrwxrwx. 1 root root 35 Jun 27 15:52 bin -> /tmp/preflight-472685243/fs/usr/bin
drwxr-xr-x. 1 root root 0 Jun 27 15:52 boot
drwxr-xr-x. 1 root root 0 Jun 27 15:52 dev
drwxr-xr-x. 1 root root 1842 Jun 27 15:52 etc
-rw-r--r--. 1 root root 0 Jun 27 15:52 foo
drwxr-xr-x. 1 root root 0 Jun 27 15:52 home
lrwxrwxrwx. 1 root root 35 Jun 27 15:52 lib -> /tmp/preflight-472685243/fs/usr/lib
lrwxrwxrwx. 1 root root 37 Jun 27 15:52 lib64 -> /tmp/preflight-472685243/fs/usr/lib64
drwxr-xr-x. 1 root root 0 Jun 27 15:52 lost+found
drwxr-xr-x. 1 root root 0 Jun 27 15:52 media
drwxr-xr-x. 1 root root 0 Jun 27 15:52 mnt
drwxr-xr-x. 1 root root 0 Jun 27 15:52 opt
lrwxrwxrwx. 1 root root 46 Jun 27 15:52 os-release -> /tmp/preflight-472685243/fs/usr/lib/os-release
lrwxrwxrwx. 1 root root 42 Jun 27 15:52 os-release-soft -> /tmp/preflight-472685243/fs/etc/os-release
drwxr-xr-x. 1 root root 0 Jun 27 15:52 proc
drwxr-xr-x. 1 root root 254 Jun 27 15:52 root
drwxr-xr-x. 1 root root 8 Jun 27 15:52 run
lrwxrwxrwx. 1 root root 36 Jun 27 15:52 sbin -> /tmp/preflight-472685243/fs/usr/sbin
drwxr-xr-x. 1 root root 0 Jun 27 15:52 srv
drwxr-xr-x. 1 root root 0 Jun 27 15:52 sys
drwxr-xr-x. 1 root root 72 Jun 27 15:52 tmp
drwxr-xr-x. 1 root root 100 Jun 27 15:52 usr
drwxr-xr-x. 1 root root 166 Jun 27 15:52 var
You might notice /os-release is hardlinked to /etc/os-release, which itself is a link. I see some issues with links to links to links (spanning link types), but GNU tar seems to exhibit the same issue, and the running container based on the image also sees the broken link. To that end, I'm thinking this is close enough to what we expect to be written to disk for most tasks Preflight needs to execute.
Let me know if you see something different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really think that trying to mix the handling of symbolic links with the handling of hard links is adding more complexity than it's removing. For symbolic links, the oldname
to pass to os.Symlink()
is just the Linkname
. For hard links and os.Link()
, it's filepath.Join(dst, header.Linkname)
.
If it's possible that any of the images being analyzed will attempt tricks by having the Linkname
refer to something outside of dst
, it can get more complicated than that, but I don't know if that's a concern here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I've split the two, and as discussed, added a few safeguards around the writing of symlinks that may be broken as present in the archive, but may point to things outside of our extraction base directory.
I was unable to get rid of the link resolution code even with splitting it out, but overall I think the hardlink logic functions a bit better (I saw a hardlink testcase resolve that was previously broken).
PTAL.
from change #1179: |
69a1942
to
8ed7767
Compare
from change #1179: |
8ed7767
to
d4dbf03
Compare
from change #1179: |
Hi, fix is coming for the issue that appeared in the last DCI job, if launching a new job and failing again here, you can add the following to your PR description to launch the fix (saying this because I don't have permissions to edit the PR description in this repo):
Thanks! |
internal/engine/engine.go
Outdated
original := filepath.Join(dst, header.Linkname) | ||
// Create the new link's directory if it doesn't exist. | ||
if !strings.HasPrefix(original, dst) { | ||
logger.V(log.DBG).Info("Error processing symlink. Symlink would reach outside of the image archive. Skipping this link", "link", header.Name, "linkedTo", header.Linkname, "resolvedTo", original) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log line is incorrect right? We want to say hard link
like on L476? Or do we not need that clarity here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. If we operate with the assumption that all hard links are archived using the root of the tarball as the base path, then we actually don't need this logic here at all for hard links. I've removed it in the latest push.
… links Signed-off-by: Jose R. Gonzalez <[email protected]>
d4dbf03
to
6a47790
Compare
from change #1179: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit. Just a suggestion.
@@ -412,6 +412,13 @@ func untar(ctx context.Context, dst string, r io.Reader) error { | |||
|
|||
// if it's a file create it | |||
case tar.TypeReg: | |||
// If the file's parent dir doesn't exist, create it. | |||
dirname := filepath.Dir(target) | |||
if _, err := os.Stat(dirname); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now happening in every branch. Perhaps it should be outside the switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, but I think it has to happen in the switch because it conditionally happens in some of the cases (e.g. Symlinks), and must happen (though we could probably make it work) in others (e.g. Directories)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't conditionally happen though. It happens in every case. The only thing that changes is that target
is either the real thing in the case of tar.TypeDir, or just the Dir portion of the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synced up with @bcrochet but to document it here: Symlinks are skipped if they refer to a path outside of the tarball root once fully resolved (e.g. by way of ../../../../
). We skip them without writing anything. For this reason, we cannot write before the case statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: acornett21, bcrochet, komish The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR addresses two issues:
When extracting the image tarball to the filesystem, we would fail if a file (e.g.
etc/foo.conf
) was created before its parent directory (e.g.etc/
). This has been fairly uncommon, but for cases where this happens, we'll now create the directory if it's missing.We did not handle hard links before, and now we do. We also would fail to handle all links properly, in that links pointing to a path relative to the link would not be resolved properly. This PR properly resolves the link path and the original file's path relative to the link path if it's a relative reference (e.g.
../
). All links are still executed using full paths.