Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to read and write OsPaths without interpreting them? #233

Open
jefdaj opened this issue Jun 14, 2024 · 3 comments
Open

How to read and write OsPaths without interpreting them? #233

jefdaj opened this issue Jun 14, 2024 · 3 comments

Comments

@jefdaj
Copy link

jefdaj commented Jun 14, 2024

I'm trying to read and write lists of OsPaths (actually just PosixPaths in case that matters) to files. I want to avoid doing any conversion or interpretation if possible---just treat the paths as opaque bytestrings separated by \NUL.

I see that I could use encodeFS and decodeFS, but 1) that's incompatible with Attoparsec (annoyingly, the Parser monad isn't a transformer), 2) it forces IO into a lot of otherwise pure code, and 3) the extra round-trip seems more likely to introduce encoding bugs than prevent them.

I'm about to try breaking into the hidden modules and using the raw constructors. But is there a more recommended way to read/write PosixPaths?

One idea that comes to mind is adding a Binary/Bytable instance? I haven't looked into that before. But a trivial instance that just wraps/unwraps the constructor seems like it would be equivalent to exposing the constructor itself.

Edit: also, thanks for taking on this OsPath thing! I'm not well versed in low level encodings and am glad someone is working on it. I would offer to help to the extent I can without breaking anything. I'm working on Arbitrary instances to check that my code can round-trip trees of OsPaths to folders on disk. Maybe a version of those could end up in the library and help identify bugs?

@jefdaj
Copy link
Author

jefdaj commented Jun 14, 2024

Of course after posting this, I finally noticed you can access the raw constructors in the OsString package! Is that what I should be doing?

@hasufell
Copy link
Member

From what I understand you want to write filepaths to a file on disk?

Indeed I would avoid decodeFS. How to access the raw bytes in a cross platform manner is described here: https://hasufell.github.io/posts/2022-06-29-fixing-haskell-filepaths.html#accessing-the-raw-bytes-in-a-cross-platform-manner

I haven't looked into that before. But a trivial instance that just wraps/unwraps the constructor seems like it would be equivalent to exposing the constructor itself.

The problem is that we are dealing with wide char array on windows ([Word16]) as opposed to char array on unix ([Word8]). So you'd still somehow need to encode the platform information (maybe as a magic bit?) for OsPath. Binary instances for PosixPath and WindowsPath are indeed trivial. So if you're just dealing with PosixPath, you can unwrap the underlying ShortByteString and turn it into a ByteString.

Wrt attoparsec, also see haskell/attoparsec#225

My idea was to provide a way to convert to Data.Bytes.Bytes (which is a sliceable type) and then use that for efficient parsing. But we still have the problem that on Windows we are dealing with wide char arrays.

Of course after posting this, I finally noticed you can access the raw constructors in the OsString package! Is that what I should be doing?

Yes

@hasufell
Copy link
Member

Related: #161

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants