Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline small files #187

Merged
merged 4 commits into from
Sep 13, 2023
Merged

Inline small files #187

merged 4 commits into from
Sep 13, 2023

Conversation

alexlarsson
Copy link
Collaborator

Some files are smaller than the size needed for the redirect and the digest, lets allow storing these inline instead.

@cgwalters
Copy link
Contributor

Quick q, not a full review yet: How does this not change the digest?

@alexlarsson
Copy link
Collaborator Author

@cgwalters It changes the digest, but only if you enable it. The default change in mkcomposefs does change the default, but I was thinking we don't have a stable release yet, so it's fine to change at this point.

It doesn't for example change the digest of an ostree generated image, because it doesn't call lcfs_set_content().

Copy link
Contributor

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does mkfs.erofs do this inlining by default? It looks like it? Does it also use 64 bytes by default?

It feels like this is exactly the kind of thing we should worry about trying to change "post 1.0" because what if we find out that in some very important workloads it really helps to inline 128 bytes or something?

Or I guess we should aim to match mkfs.erofs, so if it has it off by default we should too...

But if we have these sorts of tunables I fear/suspect we're going to end up needing to compute multiple of them and hence allow making it a dynamic choice on the client which tunables to set like this.

libcomposefs/lcfs-writer.c Outdated Show resolved Hide resolved
libcomposefs/lcfs-writer.c Outdated Show resolved Hide resolved
libcomposefs/lcfs-writer.c Outdated Show resolved Hide resolved
libcomposefs/lcfs-writer.c Outdated Show resolved Hide resolved
@hsiangkao
Copy link
Contributor

hsiangkao commented Sep 13, 2023

Does mkfs.erofs do this inlining by default? It looks like it? Does it also use 64 bytes by default?

Yes, mkfs.erofs inlines regular files if possible unless passing in -Enoinline_data [1]. EROFS uses 64-byte inodes just because most people need some timestamp (if a fixed timestamp is passed in like "-T0", mkfs.erofs will use 32-byte inodes instead), but 32-byte doesn't have a way to store ns timestamp (Squashfs keeps in 32-byte inodes but it's just a 4-byte timestamp without nanosecond, and it doesn't support pre-1970 time, so a bit loss on this part. But I'm not sure people really care.)

[1] erofs/erofs-utils@60549d5

@alexlarsson
Copy link
Collaborator Author

Does mkfs.erofs do this inlining by default? It looks like it? Does it also use 64 bytes by default?

I don't understand this question? mkfs.erofs will always put all file data in the filesystem image. Where else would it store it? (Although for the erofs file data, there is a question of whether to store it next to the inode, or in a separate block, but that is a different question from this PR.)

For mkcomposefs the question is whether to store the data inline in the erofs image, or whether to use a separate object file in the base directory.

Basically, suppose you have an rootfs dir with a 4 byte file, and you run mkcomposefs on it. Do we produce a erofs image that has a 4 byte file? Or do we create a erofs image that has a 0 byte file with an 65 byte overlay.redirect xattr and a 36 byte overlay.metacopy xattr, plus a 4 byte external object file?

It feels like this is exactly the kind of thing we should worry about trying to change "post 1.0" because what if we find out that in some very important workloads it really helps to inline 128 bytes or something?

There is nothing that fundamentally disallows an inlining limit of 128 bytes with the proposed API. The only place that currently hardcodes the 64 byte limit is the "convert this real directory to a lcfs-node" helper. Ostree for example doesn't use that, and could pick whatever inline limit it wanteds (if it wanted too). And, we could easily add an option to lcfs_load_node_from_file() to use a different limit if we wanted. It would be up to each user of the tools/APIs whether they wanted to enable said flag, with the result that the generated images would get a different digest.

That said, I don't think the limit is all that important once you reach some slightly larger size, because the overhead starts to be less ridiculous. So, I don't think the precise limit is all that important, as long as e.g. files smaller than a sha256 digest are inlined.

Or I guess we should aim to match mkfs.erofs, so if it has it off by default we should too...

mkfs.erofs never ever puts any files in the basedir though. The concept doesn't even exist.

But if we have these sorts of tunables I fear/suspect we're going to end up needing to compute multiple of them and hence allow making it a dynamic choice on the client which tunables to set like this.

I agree that we should limit tunables, which is why I made this the default for mkcomposefs. Maybe we should drop the "--no-inline" option though?

Signed-off-by: Alexander Larsson <[email protected]>
@alexlarsson alexlarsson force-pushed the inline-files branch 2 times, most recently from d6653b3 to da9d40d Compare September 13, 2023 09:23
For small files, it makes no sense to use a redirect to the basedir.
The size of the redirect xattr value itself is typically 65 bytes, and
on top of that comes the xattr name overhead. Things are even worse if
we also want to store the fs-verity digest. The overhead for a very
small file is significant.

So, we add the ability to store the file content to save in a
lcfs_node. And when serializing the erofs layer, if it is set we
skip the redirect xattrs, and just directly store the data in the
erofs file (instead of making it a sparse file).

Signed-off-by: Alexander Larsson <[email protected]>
@alexlarsson alexlarsson force-pushed the inline-files branch 3 times, most recently from 1e7df35 to 12b5e15 Compare September 13, 2023 11:04
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexlarsson
Copy link
Collaborator Author

Hmm, missing here is cfs-fuse support for reading these files. Will have a look at that.

@cgwalters
Copy link
Contributor

(Although for the erofs file data, there is a question of whether to store it next to the inode, or in a separate block, but that is a different question from this PR.)

Yes, I was referring to the inode.

For mkcomposefs the question is whether to store the data inline in the erofs image, or whether to use a separate object file in the base directory.

Right but I think you answered that with this:

So, I don't think the precise limit is all that important, as long as e.g. files smaller than a sha256 digest are inlined.

Yes, this makes sense to me - so I'd argue: let's do it by default, take another "checksum epoch" and try to commit to your suggested 64 bytes by default.

@cgwalters
Copy link
Contributor

Just going to link this one for you and me to have a blast from the past: https://ometer.com/preferences.html - it's an argument not to add an option.

@alexlarsson
Copy link
Collaborator Author

I'm gonna merge this one and do the cfs-fuse change as part of !185 which has some other fuse stuff.

@alexlarsson
Copy link
Collaborator Author

alexlarsson commented Sep 13, 2023

So, I don't think the precise limit is all that important, as long as e.g. files smaller than a sha256 digest are inlined.

Yes, this makes sense to me - so I'd argue: let's do it by default, take another "checksum epoch" and try to commit to your suggested 64 bytes by default.

Well, we sort of already have a checksum epoch for the support of xattr escapes. But, I think we're now essentially ready to have a real frozen format.

libcomposefs/lcfs-writer.c Outdated Show resolved Hide resolved
tools/mkcomposefs.c Outdated Show resolved Hide resolved
tests/dumpdir Outdated Show resolved Hide resolved
libcomposefs/lcfs-writer.c Show resolved Hide resolved
Unless you pass this to lcfs_load_node_from_file() then it will
automatically add the file content for small files to the lcfs_nodes,
which means that the file data will be stored in the erofs image instead
of in a separate file in the basedir.

The limit for small files is 64 bytes, which is the size of a sha256
digest in hex form. That is typically the size of a redirect xattr
that would be used otherwise, so the end result is that we always get
a smaller image if we inline these. 64 is also the typical size of a
erofs inode entry, so this will inline well in the inode table.

Signed-off-by: Alexander Larsson <[email protected]>
Instead of ls -lR we have a custom directory dumper that also
dumps sha256 checksums as well as file xattrs. This gives
us more thorough checking.

Also, try to build a composefs image from the mounted composefs and
ensure we get the same digest again.

Signed-off-by: Alexander Larsson <[email protected]>
@alexlarsson alexlarsson merged commit df1bd03 into containers:main Sep 13, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants