Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zv: create thin wrapper for Path and PathBuf #1010

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yassinebenarbia
Copy link

@yassinebenarbia yassinebenarbia commented Sep 20, 2024

@yassinebenarbia yassinebenarbia changed the title zu: create thin wrapper for bot Path and PathBuf zu: create thin wrapper for Path and PathBuf Sep 20, 2024
Copy link
Contributor

@zeenix zeenix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this contribution. Looking good in general, just need a few tweaks, I think. Apart from the inline comments, a few things about the commit and PR title:

  • Being a thin wrapper is a detail. The title should be "🏷️ zv: Add FilePath type" (I'll take care of the PR title).
  • Kindly add details to commit log as well, not just the PR. In this case, they could be identical.
  • Add Fixes #977. line to the commit & PR description.

zvariant/src/file_path.rs Outdated Show resolved Hide resolved
zvariant/src/file_path.rs Show resolved Hide resolved
zvariant/src/file_path.rs Outdated Show resolved Hide resolved
zvariant/src/file_path.rs Outdated Show resolved Hide resolved
Comment on lines 42 to 45
unsafe {
Ok(FilePath(
OsStr::from_encoded_bytes_unchecked(v).to_os_string()
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. Are we safe here? The docs say that we're violating the safety rules here:

As the encoding is unspecified, callers must pass in bytes that originated as a mixture of validated UTF-8 and bytes from OsStr::as_encoded_bytes from within the same Rust version built for the same target platform. For example, reconstructing an OsStr from bytes sent over the network or stored in a file will likely violate these safety rules.

In any case, we'll want a Safety: commend above that proves why this is safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the comment. However, I'm not sure I follow.

File path do not necessarily contain only a sequence of UTF-8 characters, thus it's safe to assume that the path contains a non-vaalid UTF-8 characters

  • Firstly something not necessarily containing only UTF-8 chars, doesn't mean that it contains only non-UTF8.
  • Secondly, even if the conclusion that it only contains non-UTF8 chars, doesn't guarantee safety.
  • Thirdly, the task it to document why the unsafe usage is still safe (i-e how are we ensuring that the invariant specified by the docs that I quoted here, hold true.
  • Lastly a typo: vaalid -> valid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It was an assumption rather than an assertion i.e. not necessarily, but safe to assume as a path can contain other non-UTF8 symbols, so we have no reason to say that it only contains UTF8 symbols, thus, the assumption.
  • yes, you were right, the comment has nothing to do with the unsafe call, by bad ig ;p
  • Also I've moved out from the unsafe call to a more safe approach

Copy link
Contributor

@zeenix zeenix Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It was an assumption rather than an assertion i.e. not necessarily, but safe to assume as a path can contain other non-UTF8 symbols, so we have no reason to say that it only contains UTF8 symbols, thus, the assumption.

Right but assuming UTF-8 where bytes could have non-UTF8 would be problematic. The other way around is not a problem and hence need no documentation and only makes it confusing.

  • yes, you were right, the comment has nothing to do with the unsafe call, by bad ig ;p

No worries. That's why we've reviews.

Cool but seems all it really gives us is a null-byte check (which we can do safely ourselves too). I also realised that we're forcing copying/allocation where we probably don't need to. I think you should be implementing/using Visitor::visit_borrowed_bytes instead combined with https://doc.rust-lang.org/std/ffi/struct.OsStr.html#method.from_encoded_bytes_unchecked .

I'm still a bit unsure about safety of it all here. For example, what happens if the invariant of these methods don't hold? 🤔 Maybe we can keep this type unix-only, since at least for unix platforms, you can safely deserialize &OsStr from bytes: https://doc.rust-lang.org/std/os/unix/ffi/trait.OsStrExt.html

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I've made some changes that you can find here, I've tried to minimize copying/allocating memory as much as I can tho, but I'm not quite sure that I didn't miss something, so please let me know if you spotted anything :p, also, if you find those changes satisfying, I'll be adding them to the PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Can you please just push it to this branch so it's easier to review specific parts of the code?

zvariant/src/file_path.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@zeenix zeenix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that, it's getting closer to being ready. Oh and the task was to fix the commit itself, as per the last point in the contribution guidelines. If you find this task too much and the docs and tools recommended in the contribution guideline don't help, please let me know and I can help you for this first contribution.

Oh and as the CI would point out, you'd need to fix the formatting but that's easy. :)

}
}

impl<'f> From<&'f str> for FilePath<'f> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're creating an owned version out of &str (I guess you can't create a borrowed version? 🤔), it's FilePath<'static> here.

Comment on lines 42 to 45
unsafe {
Ok(FilePath(
OsStr::from_encoded_bytes_unchecked(v).to_os_string()
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the comment. However, I'm not sure I follow.

File path do not necessarily contain only a sequence of UTF-8 characters, thus it's safe to assume that the path contains a non-vaalid UTF-8 characters

  • Firstly something not necessarily containing only UTF-8 chars, doesn't mean that it contains only non-UTF8.
  • Secondly, even if the conclusion that it only contains non-UTF8 chars, doesn't guarantee safety.
  • Thirdly, the task it to document why the unsafe usage is still safe (i-e how are we ensuring that the invariant specified by the docs that I quoted here, hold true.
  • Lastly a typo: vaalid -> valid.

Comment on lines 12 to 91
/// # Exmples
/// ```
/// use zvariant::FilePath;
/// use std::path::{Path, PathBuf};
///
/// let path = Path::new("/hello/world");
/// let path_buf = PathBuf::from(path);
///
/// let p1 = FilePath::from(path);
/// let p2 = FilePath::from(path_buf);
/// let p3 = FilePath::from("/hello/world");
///
/// assert_eq!(p1, p2);
/// assert_eq!(p2, p3);
/// ```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding examples. 👍 However, the main use of this API is (de)serializing. So it'd be good to have those examples here (especially since that's not being tested by unit tests).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some tests for the (de)serialization process, but didn't change the examples other than adding some Intos, since I really think that the way the type is (de)serialized should not matter for users, as they should be able to just wrap their file paths in it and send/receive it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think that the way the type is (de)serialized should not matter for users

As I wrote, that's the main use here. Users will not be using this a generic type for file paths, but OK, no biggie.

@yassinebenarbia
Copy link
Author

For now, I'm thinking of a commit comment+description for my changes, if you can, can you please help me figure that out! these are the changes that I'm planing to made and TBH, I'm really inclined towards making a clean new PR instead of adding to this one :) but this is also the PR that was mentioned in the Issue that I've opened #977.

@zeenix
Copy link
Contributor

zeenix commented Sep 30, 2024

if you can, can you please help me figure that out! these are the changes that I'm planing to made

Well, it's still just a fixup of your original single commit here. You first do an interactive git rebase (git rebase -i) and mark all follow-up commits in your branch to be fixups (by changing the pick to f for subsequent commit lines). Then you just use git commit --amend to change the commit message. Finally push the changes using git push -f REPO BRANCH. If you need to make more changes on top, just git commit -a --amend.

TBH, I'm really inclined towards making a clean new PR instead of adding to this one :)

That only creates noise for no reason. In the end it's not about the PR but rather the git branch. If you've a clean branch, you can just push it to the branch of this PR (just need to force push it).

@zeenix zeenix changed the title zu: create thin wrapper for Path and PathBuf zv: create thin wrapper for Path and PathBuf Oct 17, 2024
Create `FilePath` type to serve as a thin abstraction that handles
(de)serialization of a file path, since en/decoding them with serde
is limited for only UTF-8 characters.
@zeenix
Copy link
Contributor

zeenix commented Oct 21, 2024

I'm sorry for the delay but I've been travelling. I'll review now..

Comment on lines +14 to +15
/// While `zvariant::Type` and `serde::{Serialize, Deserialize}`, are implemented for [`Path`] and [`PathBuf`], unfortunately `serde` serializes them as UTF-8 strings. This is not the desired behavior in most cases since file paths are not guaranteed to contain only UTF-8 characters.
/// To solve this problem, this type is provided which encodes the underlying file path as a null-terminated byte array. Encoding as byte array is also more efficient. The Prodigy - Breathe (Brooks Aleksander Remix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The formatting is wrong (hopefully the CI will catch that). It's best to always run cargo +nightly fmt and cargo clippy before pushing.
  • What's the "The Prodigy - Breathe (Brooks Aleksander Remix)" part about? 🤔

/// While `zvariant::Type` and `serde::{Serialize, Deserialize}`, are implemented for [`Path`] and [`PathBuf`], unfortunately `serde` serializes them as UTF-8 strings. This is not the desired behavior in most cases since file paths are not guaranteed to contain only UTF-8 characters.
/// To solve this problem, this type is provided which encodes the underlying file path as a null-terminated byte array. Encoding as byte array is also more efficient. The Prodigy - Breathe (Brooks Aleksander Remix)
///
/// # Exmples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty line under headings please.

Comment on lines +18 to +47
/// Consider using the `from` and `into` methods to convert/cast [FilePath] to other compatible types, see the example bellow for reference
/// ```
/// use zvariant::FilePath;
/// use std::path::{Path, PathBuf};
/// use std::ffi::{CStr, OsStr, OsString, CString};
///
/// let path = Path::new("/hello/world");
/// let path_buf = PathBuf::from(path);
/// let osstr = OsStr::new("/hello/world");
/// let os_string = OsString::from("/hello/world");
/// let cstr = CStr::from_bytes_until_nul("/hello/world\0".as_bytes()).unwrap_or_default();
/// let cstring = CString::new("/hello/world").unwrap_or_default();
///
/// let p1 = FilePath::from(path);
/// let p2 = FilePath::from(path_buf);
/// let p3 = FilePath::from(osstr);
/// let p4 = FilePath::from(os_string);
/// let p5 = FilePath::from(cstr);
/// let p6 = FilePath::from(cstring);
/// let p7 = FilePath::from("/hello/world");
///
/// assert_eq!(p1, p2);
/// assert_eq!(p2, p3);
/// assert_eq!(p3, p4);
/// assert_eq!(p4, p5);
/// assert_eq!(p5, p6);
/// assert_eq!(p5, p7);
/// ```
/// Also you can (de)serialize the [FilePath] as an array of bytes, consider this for example
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I do appreciate you taking the time to write detailed examples, I don't think it's necessary. It's a very simple type and user doesn't need that much code to understand how to use it. Just a very few simple examples of building an instance from random paths and then encoding and decoding them, is more than sufficient.

Comment on lines +59 to +72
/// let path_arr = Array(vec![
/// serde_json::json!(47),
/// serde_json::json!(104),
/// serde_json::json!(101),
/// serde_json::json!(108),
/// serde_json::json!(108),
/// serde_json::json!(111),
/// serde_json::json!(47),
/// serde_json::json!(119),
/// serde_json::json!(111),
/// serde_json::json!(114),
/// serde_json::json!(108),
/// serde_json::json!(100),
/// ]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is certainly no need to involve complicated stuff, like json etc. :)

Comment on lines +124 to +125
/// This method won't allocate/copy memory unless the nul
/// byte does not exist in the `bytes` parameter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to do all this for the user. We should just simply check for the null byte and error out if it's not there.

let slice = unsafe {
// consider switching to https://doc.rust-lang.org/std/alloc/trait.Allocator.html#tymethod.allocate
// when it becames a part of the stable release
let ptr = std::alloc::alloc(Layout::new::<u8>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use Box?

Comment on lines +159 to +161
// SAFETY: this call is safe because we guarentee the nul termination
// of the `path_bytes`
let bytes_with_nul = chop_bytes_with_nul(value.as_encoded_bytes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The safety comment is above the safe line.
  • We could just use CStr::from_bytes_with_null and error out if null is missing. Then we don't need chop_bytes_with_null or unsafe even.

// SAFETY: this call is safe because we guarentee the nul termination
// of the `path_bytes`
unsafe {
return FilePath(Cow::Owned(CString::from_vec_with_nul_unchecked(path_bytes)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here, except we'd use from_vec_with_nul.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't send a non UTF-8 characters over the bus
2 participants