-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rootless podman can't bind-mount allocdir #388
Comments
In order for Nomad to match the podman socket owner, it would need to know there was a socket at all, which Nomad itself doesn't -- only the task driver has visibility into that kind of thing. So ultimately it would have to happen in the task driver. We have some precedence for having an alternate mount configuration for the recent |
I'm not that familiar with landlock, do the access grants apply to all processes started by a single uid, or does it apply to a process (tree). With the way the podman task driver works, with a rootful process reaching out to a socket to start the container, I'm not sure the latter will be viable. So even with Other complications: allocs can contain tasks using different task drivers or be run using different user accounts. A documented limitation that all tasks using this must run under the same driver/user would be fine for me at least. |
Oh to be clear, I wasn't suggesting that we use Landlock for the podman driver. Landlock only locks out the process its being called from, so that doesn't really help. Just that having a separate source for the allocdir would allow for the following workflow:
Mind, this is all in my head and I haven't actually tried implementing any of it. 😁 |
I'm not sure that works. If landlock is not being used, then the alloc_mounts dir needs to be just as protected as the normal nomad allocs dir, i.e. non-root should not be able to traverse through it. In which case neither is suitable for the rootless container. There can't be a single alternate allocs dir across all users, unless there's some external protection of some kind. This task driver could bind-mount the alloc dir into a user-private location such as /run/user/UID. The driver does not currently understand unix identities for setting directory permissions, but could be extended to do so. |
Bah, yeah, you're right... this sort of thing has been one of the big barriers to a rootless Nomad client.
Sounds like the way to go. |
Nomad considers filesystem permissions for the allocs directory to be outside of it's own security model (https://developer.hashicorp.com/nomad/docs/concepts/security)
To protect the secrets written into job allocation directories from unprivileged local users with access to the nomad client, it's required to set restrictive permissions on the allocs directory or parent, such as
0700
. The important part here is that the other permission does not include +x/1 to allow directory traversal, since secrets are written into subdirectories with accessible permissions (nobody:nobody 0777).This seems to be fundamentally incompatible with rootless containers, since the unprivileged user needs to traverse into the alloc dir in order to stat them for bind-mounting into the container. Restrictive permissions yield Driver Failure errors such as the following on container startup
One of the benefits of rootless containers and multiple sockets would be enabling stronger isolation between users on a host. Multiple sockets requires all the users which will run containers under nomad have access to the allocs directory, and therefore inherently all the secrets written to them for all jobs run by all users. This is sadly a dealbreaker for us, since it would allow secrets to be leaked across user boundaries.
The only way I can think to work around this would be nomad setting more restrictive permissions on the alloc directory itself (i.e. the one named after the job uid), e.g. setting ownership to match the podman socket owner, and 0700 permissions. Nomad itself when running as root would be able to bypass the restrictive permissions. Or POSIX ACLs on supported filesystems. I'm not sure if this can be practically implemented in the task driver alone, or if it would need support in Nomad core. At the very least, some information would need to be collected about which filesystem user the directory would need to be made accessible to. Currently the multiple-socket implementation doesn't understand which user "owns" the socket configured.
Alternatively, could this task driver bind-mount the alloc dir into some alternate path accessible by only the podman socket owner (e.g. beneath /run/user/UID), by bypass the more restrictive permission on the parent allocs dir?
The text was updated successfully, but these errors were encountered: