-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A safe MkdirAll implementation #13
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
I'm adding a couple of comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, I focused on the idea first and suggested alternatives for the !openat2 case.
I haven't checked the rest in much detail, let's see if those ideas make sense to you first.
mkdir_linux.go
Outdated
// can modify the filesystem tree between SecureJoin and MkdirAll, it is | ||
// possible for MkdirAll to resolve unsafe symlink components and create | ||
// directories outside of the root. | ||
func MkdirAll(root, unsafePath string, mode os.FileMode) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of doing something simpler, and maybe slower, for the non openat2 case?
Openat2 is supported since 5.6, so lot of distros already have it.
Something like:
createdPath := ""
for path in unsafePath { // unsafePath here can be something like strings.Split(unsafePath, string(filepath.Separator))
//NOTE: not exactly as utils.WithProcFd, as that passes a string, we want the fd. So this modified verson should use fh.Fd() instead.
utils.WithProcFd(root, createdPath), func (basePath int) {
// TODO: handle the case path exists
unix.MkdirAt(basePath, path)
}
createdPath = createdPath + path
}
In my mind, it seems semantically simpler (and simpler to see it is correct) if we handle the non-openat2() case using securejoin.Join.
I guess such a closure in go should work (or can be adapted to take path as param too).
This might be slower due to opening the proc fd, and maybe you can do symlink attacks that live inside the rootfs (I don't think it is a problem, it is as if that change was there from the beginning).
Too bad that SecureJoin doesn't return the fd. Although that might be another option to explore: a SecureJoin() variant that returns the fd too. In essence it would be like this utils.WithProcFd()
we use in runc, and we just use securejoinFd()
and mkdirat()
in a loop.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from being considerably less efficient (doing MkdirAll
this way would result in O(n^2) lookups at least because we have to look up each component each time we add a component), SecureJoin
is not a safe API to use even if you verify through /proc/self/fd
for the reasons I outlined in #12.
Also, the current implementation has the same logic after we get the path we want from either partialOpen
or partialOpenat2
. You would need to have two entirely different implementations if you do it that way.
Too bad that SecureJoin doesn't return the fd. Although that might be another option to explore: a SecureJoin() variant that returns the fd too.
That is exactly what partialOpen
does, it's just that rather than erroring out if the path doesn't exist it returns the last path it got to and the remaining components. If you don't return that, you get O(n^2) lookups because you need that information for MkdirAll
.
partialOpen
is basically a better SecureJoin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you won't like the new symlink stack I needed to add to partialOpen
to make the behaviour match openat2
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh. The more the complexity grows, the more I think O(n^2) for old kernels (4+ years) might not be so bad :-D
I wish there was an mkdirat2()
that does the mkdir and returns an fd to the created entry on success (O_PATH, etc.). That would really simplify things and remove races here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this big c&p in partialOpen feels a little weird, to be honest. That can be a better secureJoin() (without an fd it is not a great interface IMHO), but it feels like mkdir should re-use that machinery instead of c&p.
Clearly secureJoin can be implemented by using partialOpen()
and in the future we should probably do that. This might be ok for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh. The more the complexity grows, the more I think O(n^2) for old kernels (4+ years) might not be so bad :-D
After adding the symlink stack, I don't think you could reasonably emulate the symlink behaviour with SecureJoin
.
I wish there was an mkdirat2() that does the mkdir and returns an fd to the created entry on success (O_PATH, etc.). That would really simplify things and remove races here.
I suggested it, Al rejected it.
Now that you mention it, we should probably check the owner and permission bits (and whether the directory is empty, I guess) though. If any of those properties are wrong, then we're in a directory the attacker swapped. If the owner and permission bits are the same (and the directory is empty), then even if the attacker did swap the directory, there is no difference in what they can do (if they have the same permissions as us, they can mess with our directory after we created it anyway). Done!
And this big c&p in partialOpen feels a little weird, to be honest. That can be a better secureJoin() (without an fd it is not a great interface IMHO), but it feels like mkdir should re-use that machinery instead of c&p. Clearly secureJoin can be implemented by using partialOpen() and in the future we should probably do that. This might be ok for now.
In principle, you could delete most of SecureJoin
and just use partialOpen
. For paths that exist, the semantics would be identical. I would love to remove as much as of the duplication between SecureJoin
and partialOpen
as possible, don't get me wrong!
However, I didn't do it in this PR because the semantics are noticeably different when dealing with dangling symlinks and non-existent paths, and I am not entirely sure you can get all of the semantics to be completely identical. SecureJoin
's behavior for dangling symlinks in particular is something that we can't emulate with openat2
AFAICS.
I would argue one of the main API problems with SecureJoin
is precisely the fact that you can use it with non-existent paths. There really isn't a reasonable result to give when you ask for a non-existent path.
EDIT: One option might be to add a flag to partialOpen
which will cause it treat the remainingPath
as a lexical path and try to continue resolving it after doing a path.Clean
. This would let us emulate SecureJoin(root, "a/b/nonexistent/../foo")
(we hit nonexistent/../foo
as the remainingPath
and this gets cleaned to foo
and we continue to try to do the lookup). However, dangling symlinks would still need to result in an error, which would be an API change. I'll make a separate PR with a patch showing how that would work.
(I was thinking of looking into this for a possible v2 of securejoin
, depending on whether it's actually necessary. Tbh, I consider SecureJoin
to be somewhat legacy code with a borked API and weird quirks. Unless it's easy to guarantee the behaviour is identical, I'm not sure a refactor is a good idea...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'll answer to the rest in more detail later, thanks for proving so many details!)
However, I didn't do it in this PR because the semantics are noticeably different when dealing with dangling symlinks and non-existent paths, and I am not entirely sure you can get all of the semantics to be completely identical.
You mentioned the dangling symlink behaviour changes several times, also in commits IIRC, but never what is the difference. Can you please explain what is exactly the difference?
That is very useful to effectively review the PR (to see the behavior matches, to see if tests are covering all the interesting cases, etc.).
Let's start here on github, then we can see if we want that info just on the commit msg or also in comments on the code. But I doubt we don't want to preserve those details very explicitely (just "there are differences" is not as useful as listing the differences per-se :)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in a case like:
% mkdir -p root/a/b
% ln -s a/b/doesnt-exist/c root/link
where link
points to a path that doesn't exist, it is a dangling symlink.
SecureJoin("root", "link/foo")
will return a/b/doesnt-exist/c/foo
. This is because all symlinks encountered are expanded and treated as though they were part of the original unsafePath
. (In the context of SecureJoin
, this behaviour makes sense -- because it doesn't mind returning non-existent paths, and so a dangling symlink is treated the same as any other broken path.) So in a sense, symlink resolutions are "inatomic".
The old behaviour of partialOpen
(based on SecureJoin
) had similar semantics, and would result in a/b
being opened and then doesnt-exist/c
being created.
This may seem like totally reasonable behaviour, however it is not possible to easily emulate this with openat2
because openat2
has to resolve the path completely, and so a dangling symlink is never resolvable (basically, walking into a symlink is an atomic operation -- either the symlink can be resolved fully or it's dangling and you get -ENOTDIR
/-ENOENT
). If we wanted to emulate the SecureJoin
behaviour with openat2
we would need to resolve symlinks manually and re-do openat2
for each component of a symlink (which could lead to a DoS if you had very long paths for each symlink).
Instead, the new partialOpen
behaviour is designed to match openat2
-- if we are in the middle of resolving a symlink and hit a non-existent component of the symlink, we pretend as though opening the symlink itself gave us -ENOENT
.
mkdirall: make partialOpen semantically identical to openat2 implementation
is the implementation of this, and the following cases in testMkdirAllBasic
check this behaviour:
- All of the
a-fake{1,2,3}
tests. - The
link3/deep_dangling{1,2}/foo
tests (which check the case where you have deeper symlink resolution chains).
(The key thing is that the behaviour is identical when using partialOpen
and when using openat2
.)
I'll work on some partialLookupInRoot
-specific tests for this behaviour next week.
8191708
to
a258c4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to take a while to review. I'll just post this two superficial comments and I'll try to focus on the big scheme of things now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some more comments.
I'd really really like a simpler PR (~1400 lines added is a lot!)
mkdir_linux.go
Outdated
// can modify the filesystem tree between SecureJoin and MkdirAll, it is | ||
// possible for MkdirAll to resolve unsafe symlink components and create | ||
// directories outside of the root. | ||
func MkdirAll(root, unsafePath string, mode os.FileMode) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this big c&p in partialOpen feels a little weird, to be honest. That can be a better secureJoin() (without an fd it is not a great interface IMHO), but it feels like mkdir should re-use that machinery instead of c&p.
Clearly secureJoin can be implemented by using partialOpen()
and in the future we should probably do that. This might be ok for now.
59c0417
to
157a7cd
Compare
5d9f872
to
78ce0fc
Compare
0282597
to
35d1c38
Compare
A fairly common usage of SecureJoin is of the form: path, _ := securejoin.SecureJoin(root, nonExistentPath) _ = os.MkdirAll(path, 0o755) Unfortunately, such usage is not safe if the root can be modified between SecureJoin and MkdirAll. This is a known issue with SecureJoin (using path strings for the API means we cannot be sure the path is safe). However, by carefully using file handles it is possible to adapt SecureJoin to be safe to use in a race-free way with MkdirAll by merging them into a single function. The implementation is designed so that we can easily add openat2 support, which is done in a follow-up commit. MkdirAllHandle is a variant of MkdirAll which takes the root as an *os.File and returns an *os.File for the final part of the created directory tree. This is a preferable interface to use if you need a handle to the final directory. We have to update the minimum Go version to 1.20 so that we can have errors that contain more than one wrapped error. Signed-off-by: Aleksa Sarai <[email protected]>
An attacker could swap out the directory we created between us creating it and opening it. However, the worst thing they could do is force us to make a directory inside a directory with different permissions or a different mode. If they swap our directory with a directory that has the same owner and permissions as the one we created, they didn't gain anything. So, verify that the directory looks like the one we would've created. Unfortunately, there is no way to atomically create a directory and open it. I discussed this with Al a while ago and I believe he said he didn't like it (though I'm not sure if it was a blanket veto or not). Signed-off-by: Aleksa Sarai <[email protected]>
Rather than saving a handle to /proc/self/fd, save a /proc handle and use /proc/thread-self/fd. This is necessary to handle Go programs with locked threads that have unshare(CLONE_FS), such as runc. This is modelled on similar logic that exists in runc (ProcThreadSelf), and is necessary if we want to use MkdirAll in runc. Signed-off-by: Aleksa Sarai <[email protected]>
While adding openat2 support to SecureJoin doesn't make much sense (the API is wrong for handling race-safe paths), MkdirAll can use openat2 internally to make sure we safely operate on /proc and more efficiently do the partial-open necessary for MkdirAll. The safe /proc handling is based on similar logic in libpathrs. Signed-off-by: Aleksa Sarai <[email protected]>
…tation openat2(2) will fail when we try to follow a broken symlink, which means that partialOpenat2() will treat the final symlink as a non-existent path. In contrast, partialOpen() will happily return a handle and remainingPath in the middle of the symlink resolution. While either of the above semantics are acceptable, we need to unify the behaviour so that users don't get different behaviour based on their kernel version. We can't change the behaviour of openat2(2), and emulating the current partialOpen() behaviour using openat2(2) is just asking for trouble. So, make partialOpen() match the behaviour of partialOpenat2(). Unfortunately, implementing this properly (including handling recursive symlinks) requires emulating a symlink stack and making sure that we return the top symlink we are in the mille of resolving if we hit a non-existent or otherwise invalid path. If there is no last symlink, we return the partial open as normal. Yeah, this is a little ugly, but I don't see a nicer way of implementing these semantics... Signed-off-by: Aleksa Sarai <[email protected]>
These are primarily correctness tests (81.3% code coverage), to make sure we handle all of the ugly corner-cases with nested and dangling symlinks correctly. Almost all of the untested code is related to errors (most of which are annoying to mock). The one set of error paths it might be worth adding errors for is the isDeadInode() checks. Signed-off-by: Aleksa Sarai <[email protected]>
Because we depend on procfs to be correct when operating on other filesystems, having a safe procfs handle is vital. Unfortunately, if we are an administrative program running inside a container that can modify its mount configuration, our current checks are not sufficient to avoid being tricked into thinking a path is real. Luckily, with fsopen() and open_tree() it is possible to create a completely private procfs instance that an attacker cannot modify. Note that while they are both safe, they are safe for different reasons: 1. fsopen() is safe because the created mount is completely separate to any other procfs mount, so any changes to mounts on the host are irrelevant. fsopen() can fail if we trip the mnt_too_revealing() check, so we may have to fall back to open_tree() in some cases. 2. open_tree() creates a clone of a snapshot of the mount tree (or just the top mount if can avoid using AT_RECURSIVE, but mnt_too_revealing() may force us to use AT_RECURSIVE). While the tree we clone might have been messed with by an attacker, after cloning there is no way for the attacker to affect our clone (even mount propagation won't propagate into a clone[1]). The only risk is whether there are over-mounts with AT_RECURSIVE. Because anonymous mounts don't show up in mountinfo, the best we can do is check the mount id through statx to see whether a target has an overmount. We only do this for symlink operations (in particular, readlink) because openat2 handles the non-magiclink case for us and statx gained STATX_MNT_ID support in Linux 5.8. We could do this check only for the open_tree(AT_RECURSIVE) case, but it's simpler to always do it (of course, an attacker could add the mount after the check if we use the host /proc the "standard" way, but at least there's a chance we'll detect it). listmounts(2) would let us detect any overmounts at creation time, but it's not clear whether you want to error out if there is a mount over a path you never use (lxcfs has overmounts of /proc/cpuinfo and /proc/meminfo, which we never use). Unfortunately, we can only use these when running as a privileged user. In theory we could create a user namespace to gain the necessary privileges to create these handles, but this would require spawning a proper subprocess (CLONE_NEWUSER must be done in a single-threaded program, which is not possible in Go without using Go's ForkExec) which won't always work when filepath-securejoin is used as a library. It's also far too complicated to deal with in practice. This is based on similar logic I'm working on for libpathrs. [1]: This is true since at least Linux 5.12. See commit ee2e3f50629f ("mount: fix mounting of detached mounts onto targets that reside on shared mounts"). Signed-off-by: Aleksa Sarai <[email protected]>
This ensures we test all of the fallbacks even if we don't use them in practice when doing MkdirAll in our test suite. In addition, add some hooks to allow the tests to force the usage of fallbacks when using the standard getProcRoot() function everything else uses. This raises the test coverage to 87.1%. In terms of correctness tests, there are a couple of extra tests it's probably worth adding (see procfs_linux_test.go for a list). Signed-off-by: Aleksa Sarai <[email protected]>
This lets us make sure we are correctly handling all sorts of lookups that aren't obvious just by looking at MkdirAll. Of particular note are the tail-chained and other deep symlink resolution stacks (MkdirAll will just return an error, but the remainingPath we get is quite important). In addition, add some explicit tests for the error paths in symlinkStack. We can't force these with actual lookups (since they indicate a programming bug), but at least make sure we're testing them properly. This raises the test coverage to 88.6% now that we are testing more of the error paths in partialLookupInRoot. Signed-off-by: Aleksa Sarai <[email protected]>
By combining the coverage from all platforms as well as privileged and non-privileged Linux runs, the coverage is now 87.7% for the entire project. It was necessary to split the Windows CI scripts because mktemp doesn't work in Powershell and it's necessary to reimplement the whole thing to get what you need (not to mention that -test.gocoverdir needs to be quoted in a particular way to not make Powershell angry). Signed-off-by: Aleksa Sarai <[email protected]>
1be7dc5
to
afbbbca
Compare
In order to get some confidence in saying that our code safe against these kinds of racing attacks, add some tests to cover this. The full suite (with 50k runs of each new subtest) takes about 10 minutes, so we should use "go test -short" for the full matrix and add a single separate test run which can run for longer (we don't need to merge the coverage from the stress test because we hit the same codepaths as in the short test). While working on this, I found what seems to be a kernel bug with readlink(/proc/self/fd/$n). There is a workaround in the tests for this, but it is possible that the test will become flaky. This brings the test coverage up to 91.1% on Linux (91.3% combined). Signed-off-by: Aleksa Sarai <[email protected]>
We now have 91.3% test coverage, including some pretty intense stress tests (so much so that I think I may have found a kernel bug in |
A fairly common usage of SecureJoin is of the form:
Unfortunately, such usage is not safe if the root can be modified
between SecureJoin and MkdirAll. This is a known issue with SecureJoin
(using path strings for the API means we cannot be sure the path is
safe).
However, by carefully using file handles it is possible to adapt
SecureJoin to be safe to use in a race-free way with MkdirAll by merging
them into a single function.
We have to update the minimum Go version to 1.20 so that we can have
errors that contain more than one wrapped error.
The later patches include hardening using openat2 and fsopen.
Signed-off-by: Aleksa Sarai [email protected]
Things left to do:
MkdirAllHandle
API that returns a handle to the final directory and takes an*os.File
referencing the root directory./proc/thread-self
to match what runc does (which is necessary for some Go programs).openat2
-like behaviour in the userspace emulated version for dangling symlinks (specifically,MkdirAll
MkdirAllHandle
openat2
.partialLookupInRoot
symlinkStack
internalspartialOpenat2
/proc
handling.newPrivateProcMount
openTree
AT_RECURSIVE
, if possible...os.OpenFile("/proc", ...)