all: add 'copy' mount option #334

allisonkarlitskaya · 2024-09-06T12:10:41Z

This copies the content of the named composefs image into a sealed memfd and mounts that, instead of using the underlying file.

The main benefit here is that the underlying filesystem isn't pinned by having one of its files used as the basis of the loopback device. This is useful if you want to embed a composefs into an initramfs, for example.

There are also integrity benefits. Without this option, the fsverity of the file is verified at mount time, but nothing stops the file from being modified after the fact. With the copy option, we measure and compare the fsverity signature after copying into and sealing the memfd.

Right now this is implemented as a straight copy, but we may also add support for decompression at some point. The erofs images compress reasonably well using general-purpose compression algorithms: a 52MB image of /usr on my system compresses down to 14MB with zstd and 13MB with xz.

This copies the content of the named composefs image into a sealed memfd and mounts that, instead of using the underlying file. The main benefit here is that the underlying filesystem isn't pinned by having one of its files used as the basis of the loopback device. This is useful if you want to embed a composefs into an initramfs, for example. There are also integrity benefits. Without this option, the fsverity of the file is verified at mount time, but nothing stops the file from being modified after the fact. With the copy option, we measure and compare the fsverity signature after copying into and sealing the memfd. Right now this is implemented as a straight copy, but we may also add support for decompression at some point. The erofs images compress reasonably well using general-purpose compression algorithms: a 52MB image of /usr on my system compresses down to 14MB with zstd and 13MB with xz.

allisonkarlitskaya · 2024-09-06T12:12:48Z

This is not currently intended for merging, but might end up being part of the work we need to realize #332.

allisonkarlitskaya · 2024-09-06T12:32:59Z

The verity stuff is wrong here, mostly because I didn't understand how it worked: it's enforced by the filesystem and we just have to verify that the fingerprint is the one we expected. After we do that, we can't possibly read incorrect data.

Also: memfd doesn't support it, which means we'll end up with ENOTTY if we try that.

Instead, if we really want to inspect the data in the memfd, we should probably use the userspace lcfs-fsverity stuff to calculate the checksum for ourselves. If we do that after we seal, we can be confident that nothing changes. This provides us with a nice way to enforce a particular image, even if fsverity isn't enabled on the underlying filesystem. Although: for that particular scenario, I wonder if it wouldn't be easier to just use the sha256sum...

I'm not sure if there's a real benefit to pursuing this, but in any case, if we do want to allow mixing copy and digest options, we need to retrieve the verity digest before we do the copy.

cgwalters · 2024-09-06T13:59:37Z

The verity stuff is wrong here, mostly because I didn't understand how it worked: it's enforced by the filesystem and we just have to verify that the fingerprint is the one we expected. After we do that, we can't possibly read incorrect data.

Right, we should be invoking lcfs_validate_verity_fd before copying (and then skip that verification on the memfd) I think.

cgwalters · 2024-09-06T14:06:12Z

The main benefit here is that the underlying filesystem isn't pinned by having one of its files used as the basis of the loopback device. This is useful if you want to embed a composefs into an initramfs, for example.

I think at least historically some initramfs code basically solved this with rm -rf /* after the pivot? But I guess it is perhaps cleaner to have a memfd instead of a tmpfs with one unlinked file.

travier · 2024-09-09T08:31:20Z

I've not tested this change but I really like the approach. 👍🏻

cgwalters · 2024-09-11T14:04:29Z

BTW we hard require clang-format, it looks like e.g. ninja -C target clang-format (depending on your builddir)

alexlarsson · 2024-09-20T08:06:51Z

tools/mountcomposefs.c

@@ -198,6 +199,8 @@ int main(int argc, char **argv)
 			opt_ro = false;
 		} else if (strcmp("ro", key) == 0) {
 			opt_ro = true;
+		} else if (strcmp("copy", key) == 0) {


Need to add this to the manpage and the usage output.

alexlarsson · 2024-09-20T08:08:35Z

libcomposefs/lcfs-mount.c

+	ssize_t n;
+	do {
+		char buffer[1 << 20];  // 1MB
+		n = read(fd, buffer, sizeof buffer);


Need to handle EINTR.

We already have copy_file_data_classic()

alexlarsson · 2024-09-20T08:10:21Z

libcomposefs/lcfs-mount.c

+		n = read(fd, buffer, sizeof buffer);
+		if (n < 0)
+			break;
+		n = write(memfd, buffer, n);


Need to handle short writes, EINTR and write errors in general. I mean, i'm sure memfd writes don't get i/o errors, but you may be low on memory or whatever.

alexlarsson · 2024-09-20T08:12:58Z

libcomposefs/lcfs-mount.c

@@ -666,6 +668,35 @@ static errint_t lcfs_mount(struct lcfs_mount_state_s *state)
 	return -EINVAL;
 }

+static int lcfs_mount_copy_and_seal(int fd) {
+	int memfd = memfd_create(CFS_MOUNT_SOURCE, MFD_CLOEXEC | MFD_ALLOW_SEALING);


I'd use cleanup_fd int memfd = -1;: here and then return steal_fd(&memfd); (Although i see we need to add a steal_fd for this.

alexlarsson · 2024-09-20T08:14:00Z

libcomposefs/lcfs-mount.c

@@ -680,7 +711,21 @@ int lcfs_mount_fd(int fd, const char *mountpoint, struct lcfs_mount_options_s *o
 		return -1;
 	}

+	int memfd = -1;


alexlarsson · 2024-09-20T08:15:50Z

libcomposefs/lcfs-mount.c

+		memfd = lcfs_mount_copy_and_seal(fd);
+		if (memfd < 0) {
+			return memfd;
+		} else {


I'd rather skip the else clause in this kind of "didn't get error" codepath. I.e.:

if (memfd < 0) { return memfd: } state.fd = memfd;

alexlarsson · 2024-09-20T08:19:16Z

So, yeah, the integrity parts here are not correct, but I think this makes sense in some cases to eg. allow the source fs to be unmounted, but also as a general robustness thing. Like, its nice that we can guarantee "right data or EIO", but with this, we can guarantee "right data" (at least for the fs metadata), at the cost of slower mount and more memory use.

alexlarsson · 2024-09-20T08:27:03Z

I'm not sure how much this matters, but we can also with this get fs-verity validation if the file on disk doesn't have fs-verity (i.e. if it is on FAT or whatever), because we can manually valdiate the digest when copying into the memfd. Of course, we would still need fs-verity for the backing files.

Again, not sure if this is practically useful, but worth considering.

allisonkarlitskaya · 2024-09-20T08:30:06Z

I'm not sure how much this matters, but we can also with this get fs-verity validation if the file on disk doesn't have fs-verity (i.e. if it is on FAT or whatever), because we can manually valdiate the digest when copying into the memfd. Of course, we would still need fs-verity for the backing files.

Again, not sure if this is practically useful, but worth considering.

It's absolutely my plan to do the fsverity checking in userspace on the already-sealed memfd.

allisonkarlitskaya · 2024-09-20T08:31:16Z

It's absolutely my plan to do the fsverity checking in userspace on the already-sealed memfd.

...until we find some kernel hacker who is willing to add support for fsverity to tmpfs/memfd...

alexlarsson · 2024-09-20T08:35:26Z

It's absolutely my plan to do the fsverity checking in userspace on the already-sealed memfd.

I don't think this is necessary if we validated the digest on the fd after we opened it. We just copied data that was already validated (by the read call). How could it have become modified after that?

I mean, in theory it could somehow become modified in memory after reading by some ptrace attacker, but that is also true for userspace validation after the fact (i.e. the code that checks the fsverity digest of the data read from the sealed fd could have been modified). Really, the security model is that mount.composefs comes from a signed trusted initrd, and runs only code that we trust, so there is no "attacker".

alexlarsson · 2024-09-20T08:37:59Z

Btw, I would do the fsverity manual computation in the copy-to-memfd loop, then you better parallelize the "read next block" and the "compute digest of block".

allisonkarlitskaya · 2024-09-20T10:09:32Z

I think someone could probably also open("/proc/($pidof mount.composefs)/fd/3") or whatever the memfd is and modify it like that. I'm not super-concerned about that case. Supporting filesystems that lack their own fsverity is more what I'm aiming for here, and I agree that this could be done while streaming the file from the disk...

alexlarsson · 2024-09-20T11:25:10Z

I think someone could probably also open("/proc/($pidof mount.composefs)/fd/3") or whatever the memfd is and modify it like that. I'm not super-concerned about that case. Supporting filesystems that lack their own fsverity is more what I'm aiming for here, and I agree that this could be done while streaming the file from the disk...

Not if it is sealed. Oh, or i guess you mean while its happening, similar to the ptrace attack. And yeah, that is not interesting.

alexlarsson · 2024-09-20T11:27:51Z

That said, you will still definitely need fs-verity for the filesystem backing the file objects. No way out of that.

allisonkarlitskaya · 2024-09-20T11:49:53Z

That said, you will still definitely need fs-verity for the filesystem backing the file objects. No way out of that.

This much is clear. What we're imagining, though, is that the erofs layer lives in the initramfs, where it doesn't get fsverity. The digest store is on a "real" filesystem, though.

alexlarsson · 2024-09-20T13:33:39Z

This much is clear. What we're imagining, though, is that the erofs layer lives in the initramfs, where it doesn't get fsverity. The digest store is on a "real" filesystem, though.

This sounds very interesting. However, in that case the file is typically trusted by being on the initrd itself, which is signed, so fs-verity of it is not strictly necessary then. (In particular, because generally all fs-verity digests we verify against are generally trustworthy due to having the public key in the initrd.)

cgwalters · 2024-09-20T13:50:27Z

Yes agreed, there is no need for us to enforce fsverity of the erofs in this model.

Maybe we change lcfs_validate_verity_fd() to check if the source fd is a memfd, and if it is then we require that it's sealed?

Actually though...will Linux ever swap out a memfd contents? If it can, then we do need to enforce verity for it right?

cgwalters · 2024-09-20T13:57:57Z

Digging a bit there's nowadays memfd_secret() which is defined to disallow swapping...we could use that. Although, it seems possible the "The memory region is removed from the kernel page tables" part would break us loopback mounting it?

cgwalters · 2024-09-20T14:08:27Z

Of course, the other alternative design to think about here is basically: "don't put an EROFS in the initramfs, just put it unpacked in e.g. /rootfs in the initramfs". No loopback mounts, memfd etc. needed. We'd just have something like LCFS_MOUNT_DIR or so that tells the mounting code it's a plain old directory file descriptor.

However, the swapping thing still seems like a concern to me...actually look, there's already a noswap option for tmpfs, so we'd just need to be sure that the initramfs tmpfs is set up with that option.

alexlarsson · 2024-09-30T08:52:41Z

Of course, the other alternative design to think about here is basically: "don't put an EROFS in the initramfs, just put it unpacked in e.g. /rootfs in the initramfs". No loopback mounts, memfd etc. needed. We'd just have something like LCFS_MOUNT_DIR or so that tells the mounting code it's a plain old directory file descriptor.

Does this work? For example, does the initramfs tar unpacking support full xattrs? Also, my guess is that a full in-memory tmpfs of the rootfs metadata will use a lot more ram than just the erofs image.

allisonkarlitskaya · 2024-10-07T18:25:28Z

I think we've decided not to do this.

alexlarsson reviewed Sep 20, 2024

View reviewed changes

allisonkarlitskaya closed this Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

all: add 'copy' mount option #334

all: add 'copy' mount option #334

allisonkarlitskaya commented Sep 6, 2024

allisonkarlitskaya commented Sep 6, 2024

allisonkarlitskaya commented Sep 6, 2024

cgwalters commented Sep 6, 2024

cgwalters commented Sep 6, 2024

travier commented Sep 9, 2024

cgwalters commented Sep 11, 2024

alexlarsson Sep 20, 2024

alexlarsson Sep 20, 2024

cgwalters Sep 20, 2024

alexlarsson Sep 20, 2024

alexlarsson Sep 20, 2024

alexlarsson Sep 20, 2024

alexlarsson Sep 20, 2024

alexlarsson commented Sep 20, 2024

alexlarsson commented Sep 20, 2024

allisonkarlitskaya commented Sep 20, 2024

allisonkarlitskaya commented Sep 20, 2024

alexlarsson commented Sep 20, 2024

alexlarsson commented Sep 20, 2024

allisonkarlitskaya commented Sep 20, 2024

alexlarsson commented Sep 20, 2024 •

edited

Loading

alexlarsson commented Sep 20, 2024

allisonkarlitskaya commented Sep 20, 2024

alexlarsson commented Sep 20, 2024

cgwalters commented Sep 20, 2024

cgwalters commented Sep 20, 2024

cgwalters commented Sep 20, 2024

alexlarsson commented Sep 30, 2024

allisonkarlitskaya commented Oct 7, 2024

all: add 'copy' mount option #334

all: add 'copy' mount option #334

Conversation

allisonkarlitskaya commented Sep 6, 2024

allisonkarlitskaya commented Sep 6, 2024

allisonkarlitskaya commented Sep 6, 2024

cgwalters commented Sep 6, 2024

cgwalters commented Sep 6, 2024

travier commented Sep 9, 2024

cgwalters commented Sep 11, 2024

alexlarsson Sep 20, 2024

Choose a reason for hiding this comment

alexlarsson Sep 20, 2024

Choose a reason for hiding this comment

cgwalters Sep 20, 2024

Choose a reason for hiding this comment

alexlarsson Sep 20, 2024

Choose a reason for hiding this comment

alexlarsson Sep 20, 2024

Choose a reason for hiding this comment

alexlarsson Sep 20, 2024

Choose a reason for hiding this comment

alexlarsson Sep 20, 2024

Choose a reason for hiding this comment

alexlarsson commented Sep 20, 2024

alexlarsson commented Sep 20, 2024

allisonkarlitskaya commented Sep 20, 2024

allisonkarlitskaya commented Sep 20, 2024

alexlarsson commented Sep 20, 2024

alexlarsson commented Sep 20, 2024

allisonkarlitskaya commented Sep 20, 2024

alexlarsson commented Sep 20, 2024 • edited Loading

alexlarsson commented Sep 20, 2024

allisonkarlitskaya commented Sep 20, 2024

alexlarsson commented Sep 20, 2024

cgwalters commented Sep 20, 2024

cgwalters commented Sep 20, 2024

cgwalters commented Sep 20, 2024

alexlarsson commented Sep 30, 2024

allisonkarlitskaya commented Oct 7, 2024

alexlarsson commented Sep 20, 2024 •

edited

Loading