Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store each path as a Path instead of a String (which is utf8-only) #30

Open
bew opened this issue Apr 30, 2021 · 4 comments
Open

Store each path as a Path instead of a String (which is utf8-only) #30

bew opened this issue Apr 30, 2021 · 4 comments

Comments

@bew
Copy link
Contributor

bew commented Apr 30, 2021

Comment on:

pipe-rename/src/main.rs

Lines 168 to 175 in 5658581

fn expand_dir(path: &str) -> anyhow::Result<Vec<String>, io::Error> {
Ok(fs::read_dir(path)?
.filter_map(|e| e.ok()
.and_then(|e| e.path()
.into_os_string()
.into_string().ok()))
.collect())
}

I'm thinking, the filter_map exists only because the conversion from an OsString to a String could fail if the path is not UTF-8 and we might get an Err(...) instead of an Ok(String).

Edit: the filter_map also exists because each entry in from read_dir can be an error, my bad.
My observation is still valid though and is on: .into_string().ok() which will return None for a non-utf8 path, and discard it.

I don't think that's right, the tool shouldn't skip non-utf8 paths.

@marcusbuffett I'm wondering if we could change something in the entire codebase:
--> Instead of storing each path in a String (utf8), we could simply keep it as a Path (which can contain non-utf8).
For the occasional cases where we need to print a path (like for the diff), we could use a replacement char for the non-utf8 chars. With a note for the user at the end of the diff if at least one path is non-utf8, the detection / change could be done before the diff to not over complicate it.


NOTE: I'd be interested to take a stab at this if you're ok with this

@marcusbuffett
Copy link
Owner

@bew i was thinking along the same lines when I updated clap, since you can parse arguments as PathBuf instead of string, but then I ended up having to unwrap stuff to match everything else. I agree we should be working with PathBufs when possible, would love for you to take a stab at it

@marcusbuffett
Copy link
Owner

Or OsString or whatever, not totally sure, I leave it up to you

@mtimkovich
Copy link
Collaborator

mtimkovich commented May 1, 2021 via email

@assarbad
Copy link
Contributor

Just thinking out loud: wouldn't a change like this:

  1. cause issues across the Windows/non-Windows boundary because of OsString on Windows being 16-bit code units and 8-bit on other systems? OsString to String and back should be lossless for Windows. For other systems this would only be guaranteed for UTF(-8)-based locales, though; i.e. $LANG.
  2. move the onus of dealing with the encoding to the invoked command/editor (which may be hard to convey in a generic fashion for anything that's not UTF-8)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants