-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline small files #187
Inline small files #187
Conversation
748ffe8
to
e93190f
Compare
Quick q, not a full review yet: How does this not change the digest? |
@cgwalters It changes the digest, but only if you enable it. The default change in mkcomposefs does change the default, but I was thinking we don't have a stable release yet, so it's fine to change at this point. It doesn't for example change the digest of an ostree generated image, because it doesn't call lcfs_set_content(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does mkfs.erofs do this inlining by default? It looks like it? Does it also use 64 bytes by default?
It feels like this is exactly the kind of thing we should worry about trying to change "post 1.0" because what if we find out that in some very important workloads it really helps to inline 128 bytes or something?
Or I guess we should aim to match mkfs.erofs, so if it has it off by default we should too...
But if we have these sorts of tunables I fear/suspect we're going to end up needing to compute multiple of them and hence allow making it a dynamic choice on the client which tunables to set like this.
Yes, |
I don't understand this question? mkfs.erofs will always put all file data in the filesystem image. Where else would it store it? (Although for the erofs file data, there is a question of whether to store it next to the inode, or in a separate block, but that is a different question from this PR.) For mkcomposefs the question is whether to store the data inline in the erofs image, or whether to use a separate object file in the base directory. Basically, suppose you have an rootfs dir with a 4 byte file, and you run mkcomposefs on it. Do we produce a erofs image that has a 4 byte file? Or do we create a erofs image that has a 0 byte file with an 65 byte overlay.redirect xattr and a 36 byte overlay.metacopy xattr, plus a 4 byte external object file?
There is nothing that fundamentally disallows an inlining limit of 128 bytes with the proposed API. The only place that currently hardcodes the 64 byte limit is the "convert this real directory to a lcfs-node" helper. Ostree for example doesn't use that, and could pick whatever inline limit it wanteds (if it wanted too). And, we could easily add an option to lcfs_load_node_from_file() to use a different limit if we wanted. It would be up to each user of the tools/APIs whether they wanted to enable said flag, with the result that the generated images would get a different digest. That said, I don't think the limit is all that important once you reach some slightly larger size, because the overhead starts to be less ridiculous. So, I don't think the precise limit is all that important, as long as e.g. files smaller than a sha256 digest are inlined.
mkfs.erofs never ever puts any files in the basedir though. The concept doesn't even exist.
I agree that we should limit tunables, which is why I made this the default for mkcomposefs. Maybe we should drop the "--no-inline" option though? |
Signed-off-by: Alexander Larsson <[email protected]>
d6653b3
to
da9d40d
Compare
For small files, it makes no sense to use a redirect to the basedir. The size of the redirect xattr value itself is typically 65 bytes, and on top of that comes the xattr name overhead. Things are even worse if we also want to store the fs-verity digest. The overhead for a very small file is significant. So, we add the ability to store the file content to save in a lcfs_node. And when serializing the erofs layer, if it is set we skip the redirect xattrs, and just directly store the data in the erofs file (instead of making it a sparse file). Signed-off-by: Alexander Larsson <[email protected]>
1e7df35
to
12b5e15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
12b5e15
to
98e1298
Compare
Hmm, missing here is cfs-fuse support for reading these files. Will have a look at that. |
Yes, I was referring to the inode.
Right but I think you answered that with this:
Yes, this makes sense to me - so I'd argue: let's do it by default, take another "checksum epoch" and try to commit to your suggested 64 bytes by default. |
Just going to link this one for you and me to have a blast from the past: https://ometer.com/preferences.html - it's an argument not to add an option. |
I'm gonna merge this one and do the cfs-fuse change as part of !185 which has some other fuse stuff. |
Well, we sort of already have a checksum epoch for the support of xattr escapes. But, I think we're now essentially ready to have a real frozen format. |
f6a2b42
to
53f6a2b
Compare
Unless you pass this to lcfs_load_node_from_file() then it will automatically add the file content for small files to the lcfs_nodes, which means that the file data will be stored in the erofs image instead of in a separate file in the basedir. The limit for small files is 64 bytes, which is the size of a sha256 digest in hex form. That is typically the size of a redirect xattr that would be used otherwise, so the end result is that we always get a smaller image if we inline these. 64 is also the typical size of a erofs inode entry, so this will inline well in the inode table. Signed-off-by: Alexander Larsson <[email protected]>
Instead of ls -lR we have a custom directory dumper that also dumps sha256 checksums as well as file xattrs. This gives us more thorough checking. Also, try to build a composefs image from the mounted composefs and ensure we get the same digest again. Signed-off-by: Alexander Larsson <[email protected]>
53f6a2b
to
fddb76d
Compare
Some files are smaller than the size needed for the redirect and the digest, lets allow storing these inline instead.