-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reflink_block function #85
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
I think we need to work on the API, the current API is not cross-platform and have too many platform dependent details
pub fn reflink_block( | ||
from: &fs::File, | ||
from_offset: u64, | ||
to: &fs::File, | ||
to_offset: u64, | ||
block_size: u64, | ||
) -> io::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of feel like this API is too specific to windows.
I checked the linux API https://man7.org/linux/man-pages/man2/ioctl_ficlonerange.2.html and there's no block_size
Feel like we should have use builder pattern to abstract it, with a windows-only function to specify block_size
with a default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the src_length field in the Linux ioctl_ficlonerange API serves a similar purpose to block_size.
Using the src_length parameter allows to specify a larger block size, making the operation more efficient compared to calling it on every single cluster.
struct file_clone_range {
__s64 src_fd;
__u64 src_offset;
__u64 src_length;
__u64 dest_offset;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ioctl reflinks up to src_length bytes from file descriptor src_fd at offset src_offset into the file dest_fd at offset dest_offset, provided that both are files.
If src_length is zero, the ioctl reflinks to the end of the source file.
src_length
specifies how many bytes to copy over, not the block size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention of this function is to reflink a portion of a file to another location which I think makes sense for both platforms. Maybe there is a problem with the naming, instead of block_size, it might be more appropriate to use src_length.
Overall, I believe this function can work similarly on both Windows and Linux with some additional restrictions. One challenge I've encountered is the unknown filesystem cluster size. Making an additional API call for each invocation seems too costly. Passing the cluster size as an argument would allow for prerequisite checks and handling more cases efficiently.
Proposed changes:
- Rename block_size to src_length
- Return an Err if src_length is zero
- Pass the cluster size as an additional argument
- Add a loop to handle lengths greater than 4GB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there is a problem with the naming, instead of block_size, it might be more appropriate to use src_length.
Ah so that's actually the number of bytes to copy?
Passing the cluster size as an argument would allow for prerequisite checks and handling more cases efficiently.
Maybe we can have a builder-pattern?
Cluster-size can be windows-only function, if left unspecified then it would be automatically detected.
- Return an Err if src_length is zero
I think we can keep this behavior while remain zero-cost, by using builder pattern to enable optional parameters.
On windows, if left unspecified, an additional syscall would be made to get the length.
But if simply banning it is easier, we could go with it for now, and use NonZeroU64
, so that you don't even have to handle this scenario.
/// > Note: Currently the function works only for windows. It returns `Err` for any other platform. | ||
/// | ||
/// # Windows Restrictions and Remarks | ||
/// - The source and destination regions must begin and end at a cluster boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that the same restrictions apply to linux too, it's just that the block size is fixed (4k)
https://man7.org/linux/man-pages/man2/ioctl_ficlonerange.2.html
/// | ||
/// # Windows Restrictions and Remarks | ||
/// - The source and destination regions must begin and end at a cluster boundary. | ||
/// - The cloned region must be less than 4GB in length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this restriction would make it harder to use cross-platform.
Perhaps we can add a loop to abstract over platform differences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good idea, I'll take a look
/// - If the source and destination regions are in the same file, they must not overlap. (The | ||
/// application may able to proceed by splitting up the block clone operation into multiple block | ||
/// clones that no longer overlap.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also applies to Linux, so I think we can make it a general restriction
/// - The destination region must not extend past the end of file. If the application wishes to | ||
/// extend the destination with cloned data, it must first call | ||
/// [`File::set_len`](fn@std::fs::File::set_len). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm...
File::set_len
is probably cheap on Linux, but this still makes the API slightly different on different platforms.
/// if offset > len { | ||
/// to_file.set_len(len)?; | ||
/// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm why do we need another set_len
on windows?
Don't we already have one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, it can be removed
Co-authored-by: Jiahao XU <[email protected]> Signed-off-by: Vaiz <[email protected]>
@NobodyXu , hey, I won't have time to work on this PR in December. There's no rush to get it merged, so we can either leave it as is for now, or I can create a new PR later. If you prefer, feel free to finish it yourself |
That's totally fine, it's almost Christmas time and everyone is taking a break to relax themselves. There is no need to rush anything, wish you have a nice Christmas! |
Adding block-level reflink capabilities allows for more granular and efficient data management. This can be particularly beneficial for applications that need to clone or deduplicate specific segments of large files.
Details:
--ignored
with--include-ignored
for cargo test to enable all testsreflink_block
, although it should be possible to add linux support in the future without changing function signaturereflink_block
accepts immutable reference to target file to allow calling this function on the same fileissue: #80