Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: add 'copy' mount option #334

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

allisonkarlitskaya
Copy link
Collaborator

This copies the content of the named composefs image into a sealed memfd and mounts that, instead of using the underlying file.

The main benefit here is that the underlying filesystem isn't pinned by having one of its files used as the basis of the loopback device. This is useful if you want to embed a composefs into an initramfs, for example.

There are also integrity benefits. Without this option, the fsverity of the file is verified at mount time, but nothing stops the file from being modified after the fact. With the copy option, we measure and compare the fsverity signature after copying into and sealing the memfd.

Right now this is implemented as a straight copy, but we may also add support for decompression at some point. The erofs images compress reasonably well using general-purpose compression algorithms: a 52MB image of /usr on my system compresses down to 14MB with zstd and 13MB with xz.

This copies the content of the named composefs image into a sealed memfd
and mounts that, instead of using the underlying file.

The main benefit here is that the underlying filesystem isn't pinned by
having one of its files used as the basis of the loopback device.  This
is useful if you want to embed a composefs into an initramfs, for
example.

There are also integrity benefits.  Without this option, the fsverity of
the file is verified at mount time, but nothing stops the file from
being modified after the fact.  With the copy option, we measure and
compare the fsverity signature after copying into and sealing the memfd.

Right now this is implemented as a straight copy, but we may also add
support for decompression at some point.  The erofs images compress
reasonably well using general-purpose compression algorithms: a 52MB
image of /usr on my system compresses down to 14MB with zstd and 13MB
with xz.
@allisonkarlitskaya
Copy link
Collaborator Author

This is not currently intended for merging, but might end up being part of the work we need to realize #332.

@allisonkarlitskaya
Copy link
Collaborator Author

The verity stuff is wrong here, mostly because I didn't understand how it worked: it's enforced by the filesystem and we just have to verify that the fingerprint is the one we expected. After we do that, we can't possibly read incorrect data.

Also: memfd doesn't support it, which means we'll end up with ENOTTY if we try that.

Instead, if we really want to inspect the data in the memfd, we should probably use the userspace lcfs-fsverity stuff to calculate the checksum for ourselves. If we do that after we seal, we can be confident that nothing changes. This provides us with a nice way to enforce a particular image, even if fsverity isn't enabled on the underlying filesystem. Although: for that particular scenario, I wonder if it wouldn't be easier to just use the sha256sum...

I'm not sure if there's a real benefit to pursuing this, but in any case, if we do want to allow mixing copy and digest options, we need to retrieve the verity digest before we do the copy.

@cgwalters
Copy link
Contributor

The verity stuff is wrong here, mostly because I didn't understand how it worked: it's enforced by the filesystem and we just have to verify that the fingerprint is the one we expected. After we do that, we can't possibly read incorrect data.

Right, we should be invoking lcfs_validate_verity_fd before copying (and then skip that verification on the memfd) I think.

@cgwalters
Copy link
Contributor

The main benefit here is that the underlying filesystem isn't pinned by having one of its files used as the basis of the loopback device. This is useful if you want to embed a composefs into an initramfs, for example.

I think at least historically some initramfs code basically solved this with rm -rf /* after the pivot? But I guess it is perhaps cleaner to have a memfd instead of a tmpfs with one unlinked file.

@travier
Copy link
Member

travier commented Sep 9, 2024

I've not tested this change but I really like the approach. 👍🏻

@cgwalters
Copy link
Contributor

BTW we hard require clang-format, it looks like e.g. ninja -C target clang-format (depending on your builddir)

@@ -198,6 +199,8 @@ int main(int argc, char **argv)
opt_ro = false;
} else if (strcmp("ro", key) == 0) {
opt_ro = true;
} else if (strcmp("copy", key) == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add this to the manpage and the usage output.

ssize_t n;
do {
char buffer[1 << 20]; // 1MB
n = read(fd, buffer, sizeof buffer);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle EINTR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have copy_file_data_classic()

n = read(fd, buffer, sizeof buffer);
if (n < 0)
break;
n = write(memfd, buffer, n);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle short writes, EINTR and write errors in general. I mean, i'm sure memfd writes don't get i/o errors, but you may be low on memory or whatever.

@@ -666,6 +668,35 @@ static errint_t lcfs_mount(struct lcfs_mount_state_s *state)
return -EINVAL;
}

static int lcfs_mount_copy_and_seal(int fd) {
int memfd = memfd_create(CFS_MOUNT_SOURCE, MFD_CLOEXEC | MFD_ALLOW_SEALING);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use cleanup_fd int memfd = -1;: here and then return steal_fd(&memfd); (Although i see we need to add a steal_fd for this.

@@ -680,7 +711,21 @@ int lcfs_mount_fd(int fd, const char *mountpoint, struct lcfs_mount_options_s *o
return -1;
}

int memfd = -1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup_fd

memfd = lcfs_mount_copy_and_seal(fd);
if (memfd < 0) {
return memfd;
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather skip the else clause in this kind of "didn't get error" codepath. I.e.:

if (memfd < 0) {
  return memfd:
}
state.fd = memfd;

@alexlarsson
Copy link
Collaborator

So, yeah, the integrity parts here are not correct, but I think this makes sense in some cases to eg. allow the source fs to be unmounted, but also as a general robustness thing. Like, its nice that we can guarantee "right data or EIO", but with this, we can guarantee "right data" (at least for the fs metadata), at the cost of slower mount and more memory use.

@alexlarsson
Copy link
Collaborator

I'm not sure how much this matters, but we can also with this get fs-verity validation if the file on disk doesn't have fs-verity (i.e. if it is on FAT or whatever), because we can manually valdiate the digest when copying into the memfd. Of course, we would still need fs-verity for the backing files.

Again, not sure if this is practically useful, but worth considering.

@allisonkarlitskaya
Copy link
Collaborator Author

I'm not sure how much this matters, but we can also with this get fs-verity validation if the file on disk doesn't have fs-verity (i.e. if it is on FAT or whatever), because we can manually valdiate the digest when copying into the memfd. Of course, we would still need fs-verity for the backing files.

Again, not sure if this is practically useful, but worth considering.

It's absolutely my plan to do the fsverity checking in userspace on the already-sealed memfd.

@allisonkarlitskaya
Copy link
Collaborator Author

It's absolutely my plan to do the fsverity checking in userspace on the already-sealed memfd.

...until we find some kernel hacker who is willing to add support for fsverity to tmpfs/memfd...

@alexlarsson
Copy link
Collaborator

It's absolutely my plan to do the fsverity checking in userspace on the already-sealed memfd.

I don't think this is necessary if we validated the digest on the fd after we opened it. We just copied data that was already validated (by the read call). How could it have become modified after that?

I mean, in theory it could somehow become modified in memory after reading by some ptrace attacker, but that is also true for userspace validation after the fact (i.e. the code that checks the fsverity digest of the data read from the sealed fd could have been modified). Really, the security model is that mount.composefs comes from a signed trusted initrd, and runs only code that we trust, so there is no "attacker".

@alexlarsson
Copy link
Collaborator

Btw, I would do the fsverity manual computation in the copy-to-memfd loop, then you better parallelize the "read next block" and the "compute digest of block".

@allisonkarlitskaya
Copy link
Collaborator Author

I think someone could probably also open("/proc/($pidof mount.composefs)/fd/3") or whatever the memfd is and modify it like that. I'm not super-concerned about that case. Supporting filesystems that lack their own fsverity is more what I'm aiming for here, and I agree that this could be done while streaming the file from the disk...

@alexlarsson
Copy link
Collaborator

alexlarsson commented Sep 20, 2024

I think someone could probably also open("/proc/($pidof mount.composefs)/fd/3") or whatever the memfd is and modify it like that. I'm not super-concerned about that case. Supporting filesystems that lack their own fsverity is more what I'm aiming for here, and I agree that this could be done while streaming the file from the disk...

Not if it is sealed. Oh, or i guess you mean while its happening, similar to the ptrace attack. And yeah, that is not interesting.

@alexlarsson
Copy link
Collaborator

That said, you will still definitely need fs-verity for the filesystem backing the file objects. No way out of that.

@allisonkarlitskaya
Copy link
Collaborator Author

That said, you will still definitely need fs-verity for the filesystem backing the file objects. No way out of that.

This much is clear. What we're imagining, though, is that the erofs layer lives in the initramfs, where it doesn't get fsverity. The digest store is on a "real" filesystem, though.

@alexlarsson
Copy link
Collaborator

This much is clear. What we're imagining, though, is that the erofs layer lives in the initramfs, where it doesn't get fsverity. The digest store is on a "real" filesystem, though.

This sounds very interesting. However, in that case the file is typically trusted by being on the initrd itself, which is signed, so fs-verity of it is not strictly necessary then. (In particular, because generally all fs-verity digests we verify against are generally trustworthy due to having the public key in the initrd.)

@cgwalters
Copy link
Contributor

Yes agreed, there is no need for us to enforce fsverity of the erofs in this model.

Maybe we change lcfs_validate_verity_fd() to check if the source fd is a memfd, and if it is then we require that it's sealed?


Actually though...will Linux ever swap out a memfd contents? If it can, then we do need to enforce verity for it right?

@cgwalters
Copy link
Contributor

Digging a bit there's nowadays memfd_secret() which is defined to disallow swapping...we could use that. Although, it seems possible the "The memory region is removed from the kernel page tables" part would break us loopback mounting it?

@cgwalters
Copy link
Contributor

Of course, the other alternative design to think about here is basically: "don't put an EROFS in the initramfs, just put it unpacked in e.g. /rootfs in the initramfs". No loopback mounts, memfd etc. needed. We'd just have something like LCFS_MOUNT_DIR or so that tells the mounting code it's a plain old directory file descriptor.

However, the swapping thing still seems like a concern to me...actually look, there's already a noswap option for tmpfs, so we'd just need to be sure that the initramfs tmpfs is set up with that option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants