zv: create thin wrapper for `Path` and `PathBuf` #1010

yassinebenarbia · 2024-09-20T01:00:12Z

Fixed the error in Can't send a non UTF-8 characters over the bus #977 by making a thin wrapper for Path and PathBuf with ay (array buffer) signature.
Fixes Can't send a non UTF-8 characters over the bus #977.

zeenix

Thanks so much for this contribution. Looking good in general, just need a few tweaks, I think. Apart from the inline comments, a few things about the commit and PR title:

Being a thin wrapper is a detail. The title should be "🏷️ zv: Add FilePath type" (I'll take care of the PR title).
Kindly add details to commit log as well, not just the PR. In this case, they could be identical.
Add Fixes #977. line to the commit & PR description.

zvariant/src/file_path.rs

zeenix · 2024-09-20T14:42:55Z

zvariant/src/file_path.rs

+                unsafe {
+                    Ok(FilePath(
+                            OsStr::from_encoded_bytes_unchecked(v).to_os_string()
+                    ))


Hmm.. Are we safe here? The docs say that we're violating the safety rules here:

As the encoding is unspecified, callers must pass in bytes that originated as a mixture of validated UTF-8 and bytes from OsStr::as_encoded_bytes from within the same Rust version built for the same target platform. For example, reconstructing an OsStr from bytes sent over the network or stored in a file will likely violate these safety rules.

In any case, we'll want a Safety: commend above that proves why this is safe.

Thanks for adding the comment. However, I'm not sure I follow.

File path do not necessarily contain only a sequence of UTF-8 characters, thus it's safe to assume that the path contains a non-vaalid UTF-8 characters

Firstly something not necessarily containing only UTF-8 chars, doesn't mean that it contains only non-UTF8.

Secondly, even if the conclusion that it only contains non-UTF8 chars, doesn't guarantee safety.

Thirdly, the task it to document why the unsafe usage is still safe (i-e how are we ensuring that the invariant specified by the docs that I quoted here, hold true.

Lastly a typo: vaalid -> valid.

It was an assumption rather than an assertion i.e. not necessarily, but safe to assume as a path can contain other non-UTF8 symbols, so we have no reason to say that it only contains UTF8 symbols, thus, the assumption.

yes, you were right, the comment has nothing to do with the unsafe call, by bad ig ;p

Also I've moved out from the unsafe call to a more safe approach

It was an assumption rather than an assertion i.e. not necessarily, but safe to assume as a path can contain other non-UTF8 symbols, so we have no reason to say that it only contains UTF8 symbols, thus, the assumption.

Right but assuming UTF-8 where bytes could have non-UTF8 would be problematic. The other way around is not a problem and hence need no documentation and only makes it confusing.

yes, you were right, the comment has nothing to do with the unsafe call, by bad ig ;p

No worries. That's why we've reviews.

Also I've moved out from the unsafe call to a more safe approach

Cool but seems all it really gives us is a null-byte check (which we can do safely ourselves too). I also realised that we're forcing copying/allocation where we probably don't need to. I think you should be implementing/using Visitor::visit_borrowed_bytes instead combined with https://doc.rust-lang.org/std/ffi/struct.OsStr.html#method.from_encoded_bytes_unchecked .

I'm still a bit unsure about safety of it all here. For example, what happens if the invariant of these methods don't hold? 🤔 Maybe we can keep this type unix-only, since at least for unix platforms, you can safely deserialize &OsStr from bytes: https://doc.rust-lang.org/std/os/unix/ffi/trait.OsStrExt.html

Hi! I've made some changes that you can find here, I've tried to minimize copying/allocating memory as much as I can tho, but I'm not quite sure that I didn't miss something, so please let me know if you spotted anything :p, also, if you find those changes satisfying, I'll be adding them to the PR

Cool. Can you please just push it to this branch so it's easier to review specific parts of the code?

zvariant/src/file_path.rs

zeenix

Other than that, it's getting closer to being ready. Oh and the task was to fix the commit itself, as per the last point in the contribution guidelines. If you find this task too much and the docs and tools recommended in the contribution guideline don't help, please let me know and I can help you for this first contribution.

Oh and as the CI would point out, you'd need to fix the formatting but that's easy. :)

zeenix · 2024-09-27T15:46:00Z

zvariant/src/file_path.rs

+    }
+}
+
+impl<'f> From<&'f str> for FilePath<'f> {


If you're creating an owned version out of &str (I guess you can't create a borrowed version? 🤔), it's FilePath<'static> here.

zeenix · 2024-09-27T15:53:44Z

zvariant/src/file_path.rs

+                unsafe {
+                    Ok(FilePath(
+                            OsStr::from_encoded_bytes_unchecked(v).to_os_string()
+                    ))


Thanks for adding the comment. However, I'm not sure I follow.

File path do not necessarily contain only a sequence of UTF-8 characters, thus it's safe to assume that the path contains a non-vaalid UTF-8 characters

Firstly something not necessarily containing only UTF-8 chars, doesn't mean that it contains only non-UTF8.

Secondly, even if the conclusion that it only contains non-UTF8 chars, doesn't guarantee safety.

Thirdly, the task it to document why the unsafe usage is still safe (i-e how are we ensuring that the invariant specified by the docs that I quoted here, hold true.

Lastly a typo: vaalid -> valid.

zeenix · 2024-09-27T15:56:04Z

zvariant/src/file_path.rs

+/// # Exmples
+/// ```
+/// use zvariant::FilePath;
+/// use std::path::{Path, PathBuf};
+///
+/// let path = Path::new("/hello/world");
+/// let path_buf = PathBuf::from(path);
+///
+/// let p1 = FilePath::from(path);
+/// let p2 = FilePath::from(path_buf);
+/// let p3 = FilePath::from("/hello/world");
+///
+/// assert_eq!(p1, p2);
+/// assert_eq!(p2, p3);
+/// ```


Thanks for adding examples. 👍 However, the main use of this API is (de)serializing. So it'd be good to have those examples here (especially since that's not being tested by unit tests).

I added some tests for the (de)serialization process, but didn't change the examples other than adding some Intos, since I really think that the way the type is (de)serialized should not matter for users, as they should be able to just wrap their file paths in it and send/receive it.

I really think that the way the type is (de)serialized should not matter for users

As I wrote, that's the main use here. Users will not be using this a generic type for file paths, but OK, no biggie.

yassinebenarbia · 2024-09-30T05:46:35Z

For now, I'm thinking of a commit comment+description for my changes, if you can, can you please help me figure that out! these are the changes that I'm planing to made and TBH, I'm really inclined towards making a clean new PR instead of adding to this one :) but this is also the PR that was mentioned in the Issue that I've opened #977.

zeenix · 2024-09-30T12:44:29Z

if you can, can you please help me figure that out! these are the changes that I'm planing to made

Well, it's still just a fixup of your original single commit here. You first do an interactive git rebase (git rebase -i) and mark all follow-up commits in your branch to be fixups (by changing the pick to f for subsequent commit lines). Then you just use git commit --amend to change the commit message. Finally push the changes using git push -f REPO BRANCH. If you need to make more changes on top, just git commit -a --amend.

TBH, I'm really inclined towards making a clean new PR instead of adding to this one :)

That only creates noise for no reason. In the end it's not about the PR but rather the git branch. If you've a clean branch, you can just push it to the branch of this PR (just need to force push it).

Create `FilePath` type to serve as a thin abstraction that handles (de)serialization of a file path, since en/decoding them with serde is limited for only UTF-8 characters.

zeenix · 2024-10-21T03:32:59Z

I'm sorry for the delay but I've been travelling. I'll review now..

zeenix · 2024-10-21T03:35:59Z

zvariant/src/file_path.rs

+/// While `zvariant::Type` and `serde::{Serialize, Deserialize}`, are implemented for [`Path`] and [`PathBuf`], unfortunately `serde` serializes them as UTF-8 strings. This is not the desired behavior in most cases since file paths are not guaranteed to contain only UTF-8 characters.
+/// To solve this problem, this type is provided which encodes the underlying file path as a null-terminated byte array. Encoding as byte array is also more efficient. The Prodigy - Breathe (Brooks Aleksander Remix)


The formatting is wrong (hopefully the CI will catch that). It's best to always run cargo +nightly fmt and cargo clippy before pushing.

What's the "The Prodigy - Breathe (Brooks Aleksander Remix)" part about? 🤔

zeenix · 2024-10-21T03:37:04Z

zvariant/src/file_path.rs

+/// While `zvariant::Type` and `serde::{Serialize, Deserialize}`, are implemented for [`Path`] and [`PathBuf`], unfortunately `serde` serializes them as UTF-8 strings. This is not the desired behavior in most cases since file paths are not guaranteed to contain only UTF-8 characters.
+/// To solve this problem, this type is provided which encodes the underlying file path as a null-terminated byte array. Encoding as byte array is also more efficient. The Prodigy - Breathe (Brooks Aleksander Remix)
+///
+/// # Exmples


empty line under headings please.

zeenix · 2024-10-21T03:38:56Z

zvariant/src/file_path.rs

+/// Consider using the `from` and `into` methods to convert/cast [FilePath] to other compatible types, see the example bellow for reference
+/// ```
+///    use zvariant::FilePath;
+///    use std::path::{Path, PathBuf};
+///    use std::ffi::{CStr, OsStr, OsString, CString};
+///
+///    let path = Path::new("/hello/world");
+///    let path_buf = PathBuf::from(path);
+///    let osstr = OsStr::new("/hello/world");
+///    let os_string = OsString::from("/hello/world");
+///    let cstr = CStr::from_bytes_until_nul("/hello/world\0".as_bytes()).unwrap_or_default();
+///    let cstring = CString::new("/hello/world").unwrap_or_default();
+///
+///    let p1 = FilePath::from(path);
+///    let p2 = FilePath::from(path_buf);
+///    let p3 = FilePath::from(osstr);
+///    let p4 = FilePath::from(os_string);
+///    let p5 = FilePath::from(cstr);
+///    let p6 = FilePath::from(cstring);
+///    let p7 = FilePath::from("/hello/world");
+///
+///    assert_eq!(p1, p2);
+///    assert_eq!(p2, p3);
+///    assert_eq!(p3, p4);
+///    assert_eq!(p4, p5);
+///    assert_eq!(p5, p6);
+///    assert_eq!(p5, p7);
+/// ```
+/// Also you can (de)serialize the [FilePath] as an array of bytes, consider this for example
+///


While I do appreciate you taking the time to write detailed examples, I don't think it's necessary. It's a very simple type and user doesn't need that much code to understand how to use it. Just a very few simple examples of building an instance from random paths and then encoding and decoding them, is more than sufficient.

zeenix · 2024-10-21T03:39:30Z

zvariant/src/file_path.rs

+///    let path_arr = Array(vec![
+///        serde_json::json!(47),
+///        serde_json::json!(104),
+///        serde_json::json!(101),
+///        serde_json::json!(108),
+///        serde_json::json!(108),
+///        serde_json::json!(111),
+///        serde_json::json!(47),
+///        serde_json::json!(119),
+///        serde_json::json!(111),
+///        serde_json::json!(114),
+///        serde_json::json!(108),
+///        serde_json::json!(100),
+///    ]);


There is certainly no need to involve complicated stuff, like json etc. :)

zeenix · 2024-10-21T03:43:09Z

zvariant/src/file_path.rs

+/// This method won't allocate/copy memory unless the nul
+/// byte does not exist in the `bytes` parameter


I don't think we need to do all this for the user. We should just simply check for the null byte and error out if it's not there.

zeenix · 2024-10-21T03:45:15Z

zvariant/src/file_path.rs

+    let slice = unsafe {
+        // consider switching to https://doc.rust-lang.org/std/alloc/trait.Allocator.html#tymethod.allocate
+        // when it becames a part of the stable release
+        let ptr = std::alloc::alloc(Layout::new::<u8>());


Why not just use Box?

zeenix · 2024-10-21T03:51:03Z

zvariant/src/file_path.rs

+        // SAFETY: this call is safe because we guarentee the nul termination
+        // of the `path_bytes`
+        let bytes_with_nul = chop_bytes_with_nul(value.as_encoded_bytes());


The safety comment is above the safe line.

We could just use CStr::from_bytes_with_null and error out if null is missing. Then we don't need chop_bytes_with_null or unsafe even.

zeenix · 2024-10-21T03:52:00Z

zvariant/src/file_path.rs

+        // SAFETY: this call is safe because we guarentee the nul termination
+        // of the `path_bytes`
+        unsafe {
+            return FilePath(Cow::Owned(CString::from_vec_with_nul_unchecked(path_bytes)));


Same comment here, except we'd use from_vec_with_nul.

yassinebenarbia changed the title ~~zu: create thin wrapper for bot Path and PathBuf~~ zu: create thin wrapper for Path and PathBuf Sep 20, 2024

yassinebenarbia mentioned this pull request Sep 20, 2024

Can't send a non UTF-8 characters over the bus #977

Open

zeenix requested changes Sep 20, 2024

View reviewed changes

yassinebenarbia requested a review from zeenix September 24, 2024 03:24

zeenix requested changes Sep 27, 2024

View reviewed changes

zeenix changed the title ~~zu: create thin wrapper for Path and PathBuf~~ zv: create thin wrapper for Path and PathBuf Oct 17, 2024

🏷️ zu: create FilePath type

5fb4968

Create `FilePath` type to serve as a thin abstraction that handles (de)serialization of a file path, since en/decoding them with serde is limited for only UTF-8 characters.

yassinebenarbia force-pushed the main branch from 8091305 to 5fb4968 Compare October 17, 2024 14:05

yassinebenarbia requested a review from zeenix October 17, 2024 14:06

zeenix requested changes Oct 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zv: create thin wrapper for `Path` and `PathBuf` #1010

zv: create thin wrapper for `Path` and `PathBuf` #1010

yassinebenarbia commented Sep 20, 2024 •

edited

Loading

zeenix left a comment

zeenix Sep 20, 2024

zeenix Sep 27, 2024

yassinebenarbia Sep 30, 2024

zeenix Sep 30, 2024 •

edited

Loading

yassinebenarbia Oct 15, 2024

zeenix Oct 17, 2024

zeenix left a comment

zeenix Sep 27, 2024

zeenix Sep 27, 2024

zeenix Sep 27, 2024

yassinebenarbia Sep 30, 2024

zeenix Sep 30, 2024

yassinebenarbia commented Sep 30, 2024

zeenix commented Sep 30, 2024 •

edited

Loading

zeenix commented Oct 21, 2024

zeenix Oct 21, 2024

zeenix Oct 21, 2024

zeenix Oct 21, 2024

zeenix Oct 21, 2024

zeenix Oct 21, 2024

zeenix Oct 21, 2024

zeenix Oct 21, 2024

zeenix Oct 21, 2024

		/// While `zvariant::Type` and `serde::{Serialize, Deserialize}`, are implemented for [`Path`] and [`PathBuf`], unfortunately `serde` serializes them as UTF-8 strings. This is not the desired behavior in most cases since file paths are not guaranteed to contain only UTF-8 characters.
		/// To solve this problem, this type is provided which encodes the underlying file path as a null-terminated byte array. Encoding as byte array is also more efficient. The Prodigy - Breathe (Brooks Aleksander Remix)

		/// This method won't allocate/copy memory unless the nul
		/// byte does not exist in the `bytes` parameter

zv: create thin wrapper for Path and PathBuf #1010

Are you sure you want to change the base?

zv: create thin wrapper for Path and PathBuf #1010

Conversation

yassinebenarbia commented Sep 20, 2024 • edited Loading

zeenix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zeenix Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zeenix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yassinebenarbia commented Sep 30, 2024

zeenix commented Sep 30, 2024 • edited Loading

zeenix commented Oct 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zv: create thin wrapper for `Path` and `PathBuf` #1010

zv: create thin wrapper for `Path` and `PathBuf` #1010

yassinebenarbia commented Sep 20, 2024 •

edited

Loading

zeenix Sep 30, 2024 •

edited

Loading

zeenix commented Sep 30, 2024 •

edited

Loading