-
-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zv: create thin wrapper for Path
and PathBuf
#1010
base: main
Are you sure you want to change the base?
Conversation
Path
and PathBuf
Path
and PathBuf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for this contribution. Looking good in general, just need a few tweaks, I think. Apart from the inline comments, a few things about the commit and PR title:
- Being a thin wrapper is a detail. The title should be "🏷️ zv: Add
FilePath
type" (I'll take care of the PR title). - Kindly add details to commit log as well, not just the PR. In this case, they could be identical.
- Add
Fixes #977.
line to the commit & PR description.
zvariant/src/file_path.rs
Outdated
unsafe { | ||
Ok(FilePath( | ||
OsStr::from_encoded_bytes_unchecked(v).to_os_string() | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. Are we safe here? The docs say that we're violating the safety rules here:
As the encoding is unspecified, callers must pass in bytes that originated as a mixture of validated UTF-8 and bytes from OsStr::as_encoded_bytes from within the same Rust version built for the same target platform. For example, reconstructing an OsStr from bytes sent over the network or stored in a file will likely violate these safety rules.
In any case, we'll want a Safety:
commend above that proves why this is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the comment. However, I'm not sure I follow.
File path do not necessarily contain only a sequence of UTF-8 characters, thus it's safe to assume that the path contains a non-vaalid UTF-8 characters
- Firstly something not necessarily containing only UTF-8 chars, doesn't mean that it contains only non-UTF8.
- Secondly, even if the conclusion that it only contains non-UTF8 chars, doesn't guarantee safety.
- Thirdly, the task it to document why the unsafe usage is still safe (i-e how are we ensuring that the invariant specified by the docs that I quoted here, hold true.
- Lastly a typo:
vaalid
->valid
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It was an assumption rather than an assertion i.e. not necessarily, but safe to assume as a path can contain other non-UTF8 symbols, so we have no reason to say that it only contains UTF8 symbols, thus, the assumption.
- yes, you were right, the comment has nothing to do with the
unsafe
call, by bad ig ;p - Also I've moved out from the unsafe call to a more safe approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It was an assumption rather than an assertion i.e. not necessarily, but safe to assume as a path can contain other non-UTF8 symbols, so we have no reason to say that it only contains UTF8 symbols, thus, the assumption.
Right but assuming UTF-8 where bytes could have non-UTF8 would be problematic. The other way around is not a problem and hence need no documentation and only makes it confusing.
- yes, you were right, the comment has nothing to do with the
unsafe
call, by bad ig ;p
No worries. That's why we've reviews.
- Also I've moved out from the unsafe call to a more safe approach
Cool but seems all it really gives us is a null-byte check (which we can do safely ourselves too). I also realised that we're forcing copying/allocation where we probably don't need to. I think you should be implementing/using Visitor::visit_borrowed_bytes
instead combined with https://doc.rust-lang.org/std/ffi/struct.OsStr.html#method.from_encoded_bytes_unchecked .
I'm still a bit unsure about safety of it all here. For example, what happens if the invariant of these methods don't hold? 🤔 Maybe we can keep this type unix-only, since at least for unix platforms, you can safely deserialize &OsStr
from bytes: https://doc.rust-lang.org/std/os/unix/ffi/trait.OsStrExt.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I've made some changes that you can find here, I've tried to minimize copying/allocating memory as much as I can tho, but I'm not quite sure that I didn't miss something, so please let me know if you spotted anything :p, also, if you find those changes satisfying, I'll be adding them to the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Can you please just push it to this branch so it's easier to review specific parts of the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than that, it's getting closer to being ready. Oh and the task was to fix the commit itself, as per the last point in the contribution guidelines. If you find this task too much and the docs and tools recommended in the contribution guideline don't help, please let me know and I can help you for this first contribution.
Oh and as the CI would point out, you'd need to fix the formatting but that's easy. :)
zvariant/src/file_path.rs
Outdated
} | ||
} | ||
|
||
impl<'f> From<&'f str> for FilePath<'f> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're creating an owned version out of &str
(I guess you can't create a borrowed version? 🤔), it's FilePath<'static>
here.
zvariant/src/file_path.rs
Outdated
unsafe { | ||
Ok(FilePath( | ||
OsStr::from_encoded_bytes_unchecked(v).to_os_string() | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the comment. However, I'm not sure I follow.
File path do not necessarily contain only a sequence of UTF-8 characters, thus it's safe to assume that the path contains a non-vaalid UTF-8 characters
- Firstly something not necessarily containing only UTF-8 chars, doesn't mean that it contains only non-UTF8.
- Secondly, even if the conclusion that it only contains non-UTF8 chars, doesn't guarantee safety.
- Thirdly, the task it to document why the unsafe usage is still safe (i-e how are we ensuring that the invariant specified by the docs that I quoted here, hold true.
- Lastly a typo:
vaalid
->valid
.
zvariant/src/file_path.rs
Outdated
/// # Exmples | ||
/// ``` | ||
/// use zvariant::FilePath; | ||
/// use std::path::{Path, PathBuf}; | ||
/// | ||
/// let path = Path::new("/hello/world"); | ||
/// let path_buf = PathBuf::from(path); | ||
/// | ||
/// let p1 = FilePath::from(path); | ||
/// let p2 = FilePath::from(path_buf); | ||
/// let p3 = FilePath::from("/hello/world"); | ||
/// | ||
/// assert_eq!(p1, p2); | ||
/// assert_eq!(p2, p3); | ||
/// ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding examples. 👍 However, the main use of this API is (de)serializing. So it'd be good to have those examples here (especially since that's not being tested by unit tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some tests for the (de)serialization process, but didn't change the examples other than adding some Into
s, since I really think that the way the type is (de)serialized should not matter for users, as they should be able to just wrap their file paths in it and send/receive it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really think that the way the type is (de)serialized should not matter for users
As I wrote, that's the main use here. Users will not be using this a generic type for file paths, but OK, no biggie.
For now, I'm thinking of a commit comment+description for my changes, if you can, can you please help me figure that out! these are the changes that I'm planing to made and TBH, I'm really inclined towards making a clean new PR instead of adding to this one :) but this is also the PR that was mentioned in the Issue that I've opened #977. |
Well, it's still just a fixup of your original single commit here. You first do an interactive git rebase (
That only creates noise for no reason. In the end it's not about the PR but rather the git branch. If you've a clean branch, you can just push it to the branch of this PR (just need to force push it). |
Path
and PathBuf
Path
and PathBuf
Create `FilePath` type to serve as a thin abstraction that handles (de)serialization of a file path, since en/decoding them with serde is limited for only UTF-8 characters.
I'm sorry for the delay but I've been travelling. I'll review now.. |
/// While `zvariant::Type` and `serde::{Serialize, Deserialize}`, are implemented for [`Path`] and [`PathBuf`], unfortunately `serde` serializes them as UTF-8 strings. This is not the desired behavior in most cases since file paths are not guaranteed to contain only UTF-8 characters. | ||
/// To solve this problem, this type is provided which encodes the underlying file path as a null-terminated byte array. Encoding as byte array is also more efficient. The Prodigy - Breathe (Brooks Aleksander Remix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The formatting is wrong (hopefully the CI will catch that). It's best to always run
cargo +nightly fmt
andcargo clippy
before pushing. - What's the "The Prodigy - Breathe (Brooks Aleksander Remix)" part about? 🤔
/// While `zvariant::Type` and `serde::{Serialize, Deserialize}`, are implemented for [`Path`] and [`PathBuf`], unfortunately `serde` serializes them as UTF-8 strings. This is not the desired behavior in most cases since file paths are not guaranteed to contain only UTF-8 characters. | ||
/// To solve this problem, this type is provided which encodes the underlying file path as a null-terminated byte array. Encoding as byte array is also more efficient. The Prodigy - Breathe (Brooks Aleksander Remix) | ||
/// | ||
/// # Exmples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty line under headings please.
/// Consider using the `from` and `into` methods to convert/cast [FilePath] to other compatible types, see the example bellow for reference | ||
/// ``` | ||
/// use zvariant::FilePath; | ||
/// use std::path::{Path, PathBuf}; | ||
/// use std::ffi::{CStr, OsStr, OsString, CString}; | ||
/// | ||
/// let path = Path::new("/hello/world"); | ||
/// let path_buf = PathBuf::from(path); | ||
/// let osstr = OsStr::new("/hello/world"); | ||
/// let os_string = OsString::from("/hello/world"); | ||
/// let cstr = CStr::from_bytes_until_nul("/hello/world\0".as_bytes()).unwrap_or_default(); | ||
/// let cstring = CString::new("/hello/world").unwrap_or_default(); | ||
/// | ||
/// let p1 = FilePath::from(path); | ||
/// let p2 = FilePath::from(path_buf); | ||
/// let p3 = FilePath::from(osstr); | ||
/// let p4 = FilePath::from(os_string); | ||
/// let p5 = FilePath::from(cstr); | ||
/// let p6 = FilePath::from(cstring); | ||
/// let p7 = FilePath::from("/hello/world"); | ||
/// | ||
/// assert_eq!(p1, p2); | ||
/// assert_eq!(p2, p3); | ||
/// assert_eq!(p3, p4); | ||
/// assert_eq!(p4, p5); | ||
/// assert_eq!(p5, p6); | ||
/// assert_eq!(p5, p7); | ||
/// ``` | ||
/// Also you can (de)serialize the [FilePath] as an array of bytes, consider this for example | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I do appreciate you taking the time to write detailed examples, I don't think it's necessary. It's a very simple type and user doesn't need that much code to understand how to use it. Just a very few simple examples of building an instance from random paths and then encoding and decoding them, is more than sufficient.
/// let path_arr = Array(vec![ | ||
/// serde_json::json!(47), | ||
/// serde_json::json!(104), | ||
/// serde_json::json!(101), | ||
/// serde_json::json!(108), | ||
/// serde_json::json!(108), | ||
/// serde_json::json!(111), | ||
/// serde_json::json!(47), | ||
/// serde_json::json!(119), | ||
/// serde_json::json!(111), | ||
/// serde_json::json!(114), | ||
/// serde_json::json!(108), | ||
/// serde_json::json!(100), | ||
/// ]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is certainly no need to involve complicated stuff, like json etc. :)
/// This method won't allocate/copy memory unless the nul | ||
/// byte does not exist in the `bytes` parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to do all this for the user. We should just simply check for the null byte and error out if it's not there.
let slice = unsafe { | ||
// consider switching to https://doc.rust-lang.org/std/alloc/trait.Allocator.html#tymethod.allocate | ||
// when it becames a part of the stable release | ||
let ptr = std::alloc::alloc(Layout::new::<u8>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use Box?
// SAFETY: this call is safe because we guarentee the nul termination | ||
// of the `path_bytes` | ||
let bytes_with_nul = chop_bytes_with_nul(value.as_encoded_bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The safety comment is above the safe line.
- We could just use
CStr::from_bytes_with_null
and error out if null is missing. Then we don't need chop_bytes_with_null or unsafe even.
// SAFETY: this call is safe because we guarentee the nul termination | ||
// of the `path_bytes` | ||
unsafe { | ||
return FilePath(Cow::Owned(CString::from_vec_with_nul_unchecked(path_bytes))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here, except we'd use from_vec_with_nul
.
Path
andPathBuf
withay
(array buffer) signature.Fixes Can't send a non UTF-8 characters over the bus #977.