Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A safe MkdirAll implementation #13

Merged
merged 11 commits into from
Jul 9, 2024
Merged

A safe MkdirAll implementation #13

merged 11 commits into from
Jul 9, 2024

Conversation

cyphar
Copy link
Owner

@cyphar cyphar commented Jun 26, 2024

A fairly common usage of SecureJoin is of the form:

path, _ := securejoin.SecureJoin(root, nonExistentPath)
_ = os.MkdirAll(path, 0o755)

Unfortunately, such usage is not safe if the root can be modified
between SecureJoin and MkdirAll. This is a known issue with SecureJoin
(using path strings for the API means we cannot be sure the path is
safe).

However, by carefully using file handles it is possible to adapt
SecureJoin to be safe to use in a race-free way with MkdirAll by merging
them into a single function.

We have to update the minimum Go version to 1.20 so that we can have
errors that contain more than one wrapped error.

The later patches include hardening using openat2 and fsopen.

Signed-off-by: Aleksa Sarai [email protected]


Things left to do:

  • Add a MkdirAllHandle API that returns a handle to the final directory and takes an *os.File referencing the root directory.
  • Switch to using /proc/thread-self to match what runc does (which is necessary for some Go programs).
  • Implement openat2-like behaviour in the userspace emulated version for dangling symlinks (specifically,
  • Write tests:
    • MkdirAll
    • MkdirAllHandle
    • Test with and without openat2.
      • Should we also test the auto detection?
    • partialLookupInRoot
      • symlinkStack internals
    • partialOpenat2
    • Test all three forms of /proc handling.
      • newPrivateProcMount
      • openTree
        • With and without AT_RECURSIVE, if possible...
      • os.OpenFile("/proc", ...)
      • We should probably also test the auto-detection...
    • Add rename-exchange tests.

@cyphar cyphar marked this pull request as draft June 26, 2024 10:00
@cyphar

This comment has been minimized.

Copy link

@alban alban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

I'm adding a couple of comments.

mkdir_linux.go Outdated Show resolved Hide resolved
procfs_linux.go Show resolved Hide resolved
Copy link

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, I focused on the idea first and suggested alternatives for the !openat2 case.

I haven't checked the rest in much detail, let's see if those ideas make sense to you first.

mkdir_linux.go Outdated Show resolved Hide resolved
mkdir_linux.go Show resolved Hide resolved
mkdir_linux.go Outdated Show resolved Hide resolved
mkdir_linux.go Outdated
// can modify the filesystem tree between SecureJoin and MkdirAll, it is
// possible for MkdirAll to resolve unsafe symlink components and create
// directories outside of the root.
func MkdirAll(root, unsafePath string, mode os.FileMode) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of doing something simpler, and maybe slower, for the non openat2 case?

Openat2 is supported since 5.6, so lot of distros already have it.

Something like:

createdPath := ""
for path in unsafePath { // unsafePath here can be something like strings.Split(unsafePath, string(filepath.Separator))
  
  //NOTE: not exactly as utils.WithProcFd, as that passes a string, we want the fd. So this modified verson should use fh.Fd() instead.
  utils.WithProcFd(root, createdPath), func (basePath int) {
    // TODO: handle the case path exists
    unix.MkdirAt(basePath, path)
   }
  createdPath = createdPath + path
}

In my mind, it seems semantically simpler (and simpler to see it is correct) if we handle the non-openat2() case using securejoin.Join.

I guess such a closure in go should work (or can be adapted to take path as param too).

This might be slower due to opening the proc fd, and maybe you can do symlink attacks that live inside the rootfs (I don't think it is a problem, it is as if that change was there from the beginning).

Too bad that SecureJoin doesn't return the fd. Although that might be another option to explore: a SecureJoin() variant that returns the fd too. In essence it would be like this utils.WithProcFd() we use in runc, and we just use securejoinFd() and mkdirat() in a loop.

What do you think?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from being considerably less efficient (doing MkdirAll this way would result in O(n^2) lookups at least because we have to look up each component each time we add a component), SecureJoin is not a safe API to use even if you verify through /proc/self/fd for the reasons I outlined in #12.

Also, the current implementation has the same logic after we get the path we want from either partialOpen or partialOpenat2. You would need to have two entirely different implementations if you do it that way.

Too bad that SecureJoin doesn't return the fd. Although that might be another option to explore: a SecureJoin() variant that returns the fd too.

That is exactly what partialOpen does, it's just that rather than erroring out if the path doesn't exist it returns the last path it got to and the remaining components. If you don't return that, you get O(n^2) lookups because you need that information for MkdirAll.

partialOpen is basically a better SecureJoin.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you won't like the new symlink stack I needed to add to partialOpen to make the behaviour match openat2...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh. The more the complexity grows, the more I think O(n^2) for old kernels (4+ years) might not be so bad :-D

I wish there was an mkdirat2() that does the mkdir and returns an fd to the created entry on success (O_PATH, etc.). That would really simplify things and remove races here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this big c&p in partialOpen feels a little weird, to be honest. That can be a better secureJoin() (without an fd it is not a great interface IMHO), but it feels like mkdir should re-use that machinery instead of c&p.

Clearly secureJoin can be implemented by using partialOpen() and in the future we should probably do that. This might be ok for now.

Copy link
Owner Author

@cyphar cyphar Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh. The more the complexity grows, the more I think O(n^2) for old kernels (4+ years) might not be so bad :-D

After adding the symlink stack, I don't think you could reasonably emulate the symlink behaviour with SecureJoin.

I wish there was an mkdirat2() that does the mkdir and returns an fd to the created entry on success (O_PATH, etc.). That would really simplify things and remove races here.

I suggested it, Al rejected it.

Now that you mention it, we should probably check the owner and permission bits (and whether the directory is empty, I guess) though. If any of those properties are wrong, then we're in a directory the attacker swapped. If the owner and permission bits are the same (and the directory is empty), then even if the attacker did swap the directory, there is no difference in what they can do (if they have the same permissions as us, they can mess with our directory after we created it anyway). Done!

And this big c&p in partialOpen feels a little weird, to be honest. That can be a better secureJoin() (without an fd it is not a great interface IMHO), but it feels like mkdir should re-use that machinery instead of c&p. Clearly secureJoin can be implemented by using partialOpen() and in the future we should probably do that. This might be ok for now.

In principle, you could delete most of SecureJoin and just use partialOpen. For paths that exist, the semantics would be identical. I would love to remove as much as of the duplication between SecureJoin and partialOpen as possible, don't get me wrong!

However, I didn't do it in this PR because the semantics are noticeably different when dealing with dangling symlinks and non-existent paths, and I am not entirely sure you can get all of the semantics to be completely identical. SecureJoin's behavior for dangling symlinks in particular is something that we can't emulate with openat2 AFAICS.

I would argue one of the main API problems with SecureJoin is precisely the fact that you can use it with non-existent paths. There really isn't a reasonable result to give when you ask for a non-existent path.

EDIT: One option might be to add a flag to partialOpen which will cause it treat the remainingPath as a lexical path and try to continue resolving it after doing a path.Clean. This would let us emulate SecureJoin(root, "a/b/nonexistent/../foo") (we hit nonexistent/../foo as the remainingPath and this gets cleaned to foo and we continue to try to do the lookup). However, dangling symlinks would still need to result in an error, which would be an API change. I'll make a separate PR with a patch showing how that would work.

(I was thinking of looking into this for a possible v2 of securejoin, depending on whether it's actually necessary. Tbh, I consider SecureJoin to be somewhat legacy code with a borked API and weird quirks. Unless it's easy to guarantee the behaviour is identical, I'm not sure a refactor is a good idea...)

Copy link

@rata rata Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll answer to the rest in more detail later, thanks for proving so many details!)

However, I didn't do it in this PR because the semantics are noticeably different when dealing with dangling symlinks and non-existent paths, and I am not entirely sure you can get all of the semantics to be completely identical.

You mentioned the dangling symlink behaviour changes several times, also in commits IIRC, but never what is the difference. Can you please explain what is exactly the difference?

That is very useful to effectively review the PR (to see the behavior matches, to see if tests are covering all the interesting cases, etc.).

Let's start here on github, then we can see if we want that info just on the commit msg or also in comments on the code. But I doubt we don't want to preserve those details very explicitely (just "there are differences" is not as useful as listing the differences per-se :)).

Copy link
Owner Author

@cyphar cyphar Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in a case like:

% mkdir -p root/a/b
% ln -s a/b/doesnt-exist/c root/link

where link points to a path that doesn't exist, it is a dangling symlink.

SecureJoin("root", "link/foo") will return a/b/doesnt-exist/c/foo. This is because all symlinks encountered are expanded and treated as though they were part of the original unsafePath. (In the context of SecureJoin, this behaviour makes sense -- because it doesn't mind returning non-existent paths, and so a dangling symlink is treated the same as any other broken path.) So in a sense, symlink resolutions are "inatomic".

The old behaviour of partialOpen (based on SecureJoin) had similar semantics, and would result in a/b being opened and then doesnt-exist/c being created.

This may seem like totally reasonable behaviour, however it is not possible to easily emulate this with openat2 because openat2 has to resolve the path completely, and so a dangling symlink is never resolvable (basically, walking into a symlink is an atomic operation -- either the symlink can be resolved fully or it's dangling and you get -ENOTDIR/-ENOENT). If we wanted to emulate the SecureJoin behaviour with openat2 we would need to resolve symlinks manually and re-do openat2 for each component of a symlink (which could lead to a DoS if you had very long paths for each symlink).

Instead, the new partialOpen behaviour is designed to match openat2 -- if we are in the middle of resolving a symlink and hit a non-existent component of the symlink, we pretend as though opening the symlink itself gave us -ENOENT.

mkdirall: make partialOpen semantically identical to openat2 implementation is the implementation of this, and the following cases in testMkdirAllBasic check this behaviour:

  • All of the a-fake{1,2,3} tests.
  • The link3/deep_dangling{1,2}/foo tests (which check the case where you have deeper symlink resolution chains).

(The key thing is that the behaviour is identical when using partialOpen and when using openat2.)

I'll work on some partialLookupInRoot-specific tests for this behaviour next week.

@cyphar cyphar force-pushed the safe-mkdirall branch 11 times, most recently from 8191708 to a258c4f Compare June 28, 2024 07:00
@cyphar
Copy link
Owner Author

cyphar commented Jun 28, 2024

I might want to add some more tests later, but IMHO this is ready for review now that all of the features are properly tested.

PTAL @alban @rata

@cyphar cyphar marked this pull request as ready for review June 28, 2024 07:10
@cyphar cyphar requested review from alban and rata June 28, 2024 07:11
Copy link

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to take a while to review. I'll just post this two superficial comments and I'll try to focus on the big scheme of things now.

openat2_linux.go Outdated Show resolved Hide resolved
procfs_linux.go Outdated Show resolved Hide resolved
Copy link

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more comments.

I'd really really like a simpler PR (~1400 lines added is a lot!)

util_linux_test.go Outdated Show resolved Hide resolved
mkdir_linux.go Outdated
// can modify the filesystem tree between SecureJoin and MkdirAll, it is
// possible for MkdirAll to resolve unsafe symlink components and create
// directories outside of the root.
func MkdirAll(root, unsafePath string, mode os.FileMode) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this big c&p in partialOpen feels a little weird, to be honest. That can be a better secureJoin() (without an fd it is not a great interface IMHO), but it feels like mkdir should re-use that machinery instead of c&p.

Clearly secureJoin can be implemented by using partialOpen() and in the future we should probably do that. This might be ok for now.

mkdir_linux.go Outdated Show resolved Hide resolved
@cyphar cyphar force-pushed the safe-mkdirall branch 2 times, most recently from 59c0417 to 157a7cd Compare June 29, 2024 06:53
cyphar added 10 commits July 9, 2024 19:11
A fairly common usage of SecureJoin is of the form:

  path, _ := securejoin.SecureJoin(root, nonExistentPath)
  _ = os.MkdirAll(path, 0o755)

Unfortunately, such usage is not safe if the root can be modified
between SecureJoin and MkdirAll. This is a known issue with SecureJoin
(using path strings for the API means we cannot be sure the path is
safe).

However, by carefully using file handles it is possible to adapt
SecureJoin to be safe to use in a race-free way with MkdirAll by merging
them into a single function. The implementation is designed so that we
can easily add openat2 support, which is done in a follow-up commit.

MkdirAllHandle is a variant of MkdirAll which takes the root as an
*os.File and returns an *os.File for the final part of the created
directory tree. This is a preferable interface to use if you need a
handle to the final directory.

We have to update the minimum Go version to 1.20 so that we can have
errors that contain more than one wrapped error.

Signed-off-by: Aleksa Sarai <[email protected]>
An attacker could swap out the directory we created between us creating
it and opening it. However, the worst thing they could do is force us to
make a directory inside a directory with different permissions or a
different mode. If they swap our directory with a directory that has the
same owner and permissions as the one we created, they didn't gain
anything. So, verify that the directory looks like the one we would've
created.

Unfortunately, there is no way to atomically create a directory and open
it. I discussed this with Al a while ago and I believe he said he didn't
like it (though I'm not sure if it was a blanket veto or not).

Signed-off-by: Aleksa Sarai <[email protected]>
Rather than saving a handle to /proc/self/fd, save a /proc handle and
use /proc/thread-self/fd. This is necessary to handle Go programs with
locked threads that have unshare(CLONE_FS), such as runc.

This is modelled on similar logic that exists in runc (ProcThreadSelf),
and is necessary if we want to use MkdirAll in runc.

Signed-off-by: Aleksa Sarai <[email protected]>
While adding openat2 support to SecureJoin doesn't make much sense (the
API is wrong for handling race-safe paths), MkdirAll can use openat2
internally to make sure we safely operate on /proc and more efficiently
do the partial-open necessary for MkdirAll.

The safe /proc handling is based on similar logic in libpathrs.

Signed-off-by: Aleksa Sarai <[email protected]>
…tation

openat2(2) will fail when we try to follow a broken symlink, which means
that partialOpenat2() will treat the final symlink as a non-existent
path. In contrast, partialOpen() will happily return a handle and
remainingPath in the middle of the symlink resolution.

While either of the above semantics are acceptable, we need to unify the
behaviour so that users don't get different behaviour based on their
kernel version. We can't change the behaviour of openat2(2), and
emulating the current partialOpen() behaviour using openat2(2) is just
asking for trouble. So, make partialOpen() match the behaviour of
partialOpenat2().

Unfortunately, implementing this properly (including handling recursive
symlinks) requires emulating a symlink stack and making sure that we
return the top symlink we are in the mille of resolving if we hit a
non-existent or otherwise invalid path. If there is no last symlink, we
return the partial open as normal. Yeah, this is a little ugly, but I
don't see a nicer way of implementing these semantics...

Signed-off-by: Aleksa Sarai <[email protected]>
These are primarily correctness tests (81.3% code coverage), to make
sure we handle all of the ugly corner-cases with nested and dangling
symlinks correctly.

Almost all of the untested code is related to errors (most of which are
annoying to mock). The one set of error paths it might be worth adding
errors for is the isDeadInode() checks.

Signed-off-by: Aleksa Sarai <[email protected]>
Because we depend on procfs to be correct when operating on other
filesystems, having a safe procfs handle is vital. Unfortunately, if we
are an administrative program running inside a container that can modify
its mount configuration, our current checks are not sufficient to avoid
being tricked into thinking a path is real.

Luckily, with fsopen() and open_tree() it is possible to create a
completely private procfs instance that an attacker cannot modify. Note
that while they are both safe, they are safe for different reasons:

 1. fsopen() is safe because the created mount is completely separate to
    any other procfs mount, so any changes to mounts on the host are
    irrelevant. fsopen() can fail if we trip the mnt_too_revealing()
    check, so we may have to fall back to open_tree() in some cases.

 2. open_tree() creates a clone of a snapshot of the mount tree (or just
    the top mount if can avoid using AT_RECURSIVE, but
    mnt_too_revealing() may force us to use AT_RECURSIVE). While the
    tree we clone might have been messed with by an attacker, after
    cloning there is no way for the attacker to affect our clone (even
    mount propagation won't propagate into a clone[1]).

    The only risk is whether there are over-mounts with AT_RECURSIVE.
    Because anonymous mounts don't show up in mountinfo, the best we can
    do is check the mount id through statx to see whether a target has
    an overmount. We only do this for symlink operations (in particular,
    readlink) because openat2 handles the non-magiclink case for us and
    statx gained STATX_MNT_ID support in Linux 5.8.

    We could do this check only for the open_tree(AT_RECURSIVE) case,
    but it's simpler to always do it (of course, an attacker could add
    the mount after the check if we use the host /proc the "standard"
    way, but at least there's a chance we'll detect it).

    listmounts(2) would let us detect any overmounts at creation time,
    but it's not clear whether you want to error out if there is a mount
    over a path you never use (lxcfs has overmounts of /proc/cpuinfo and
    /proc/meminfo, which we never use).

Unfortunately, we can only use these when running as a privileged user.
In theory we could create a user namespace to gain the necessary
privileges to create these handles, but this would require spawning a
proper subprocess (CLONE_NEWUSER must be done in a single-threaded
program, which is not possible in Go without using Go's ForkExec) which
won't always work when filepath-securejoin is used as a library. It's
also far too complicated to deal with in practice.

This is based on similar logic I'm working on for libpathrs.

[1]: This is true since at least Linux 5.12. See commit ee2e3f50629f
     ("mount: fix mounting of detached mounts onto targets that reside
     on shared mounts").

Signed-off-by: Aleksa Sarai <[email protected]>
This ensures we test all of the fallbacks even if we don't use them in
practice when doing MkdirAll in our test suite. In addition, add some
hooks to allow the tests to force the usage of fallbacks when using the
standard getProcRoot() function everything else uses. This raises the
test coverage to 87.1%.

In terms of correctness tests, there are a couple of extra tests it's
probably worth adding (see procfs_linux_test.go for a list).

Signed-off-by: Aleksa Sarai <[email protected]>
This lets us make sure we are correctly handling all sorts of lookups
that aren't obvious just by looking at MkdirAll. Of particular note are
the tail-chained and other deep symlink resolution stacks (MkdirAll will
just return an error, but the remainingPath we get is quite important).

In addition, add some explicit tests for the error paths in
symlinkStack. We can't force these with actual lookups (since they
indicate a programming bug), but at least make sure we're testing them
properly.

This raises the test coverage to 88.6% now that we are testing more of
the error paths in partialLookupInRoot.

Signed-off-by: Aleksa Sarai <[email protected]>
By combining the coverage from all platforms as well as privileged and
non-privileged Linux runs, the coverage is now 87.7% for the entire
project.

It was necessary to split the Windows CI scripts because mktemp doesn't
work in Powershell and it's necessary to reimplement the whole thing to
get what you need (not to mention that -test.gocoverdir needs to be
quoted in a particular way to not make Powershell angry).

Signed-off-by: Aleksa Sarai <[email protected]>
@cyphar cyphar force-pushed the safe-mkdirall branch 3 times, most recently from 1be7dc5 to afbbbca Compare July 9, 2024 13:01
In order to get some confidence in saying that our code safe against
these kinds of racing attacks, add some tests to cover this.

The full suite (with 50k runs of each new subtest) takes about 10
minutes, so we should use "go test -short" for the full matrix and add a
single separate test run which can run for longer (we don't need to
merge the coverage from the stress test because we hit the same
codepaths as in the short test).

While working on this, I found what seems to be a kernel bug with
readlink(/proc/self/fd/$n). There is a workaround in the tests for this,
but it is possible that the test will become flaky.

This brings the test coverage up to 91.1% on Linux (91.3% combined).

Signed-off-by: Aleksa Sarai <[email protected]>
@cyphar
Copy link
Owner Author

cyphar commented Jul 9, 2024

We now have 91.3% test coverage, including some pretty intense stress tests (so much so that I think I may have found a kernel bug in d_path.) I'm going to go ahead with merging this soon.

@cyphar cyphar merged commit 2192374 into main Jul 9, 2024
12 checks passed
@cyphar cyphar deleted the safe-mkdirall branch July 9, 2024 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants