Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file+noindex parsing is broken for Windows paths #10703

Open
jasagredo opened this issue Jan 3, 2025 · 6 comments
Open

file+noindex parsing is broken for Windows paths #10703

jasagredo opened this issue Jan 3, 2025 · 6 comments

Comments

@jasagredo
Copy link
Collaborator

Describe the bug

The file+noindex scheme is underspecified for Windows paths. It seems to follow the file scheme (RFC 8089), but Windows paths are not properly parsed by network-uri. A path like file:///C:/some/path should produce the path C:/some/path, however it produces
/C:/some/path.

$ cabal repl network-uri
...
ghci> import Network.URI
ghci> uri = fromJust $ parseURI "file+noindex:///C:/some/path"
ghci> putStr $ unlines ["Scheme: " <> uriScheme uri, "Auth: " <> show (uriAuthority uri), "Path: " <> uriPath uri, "Frag: " <> uriFragment uri]
Scheme: local+noindex:
Auth: Just (URIAuth {uriUserInfo = "", uriRegName = "", uriPort = ""})
Path: /C:/some/path
Frag:
ghci> readFile $ uriPath uri
*** Exception: /C:/some/path: openFile: invalid argument (Invalid argument)

And if the last slash is removed, then the C: part is parsed as part of the auth:

$ cabal repl -b network-uri
...
ghci> import Network.URI
ghci> uri = fromJust $ parseURI "file+noindex://C:/some/path"
ghci> putStr $ unlines ["Scheme: " <> uriScheme uri, "Auth: " <> show (uriAuthority uri), "Path: " <> uriPath uri, "Frag: " <> uriFragment uri]
Scheme: local+noindex:
Auth: Just (URIAuth {uriUserInfo = "", uriRegName = "C", uriPort = ":"})
Path: /some/path
Frag:
ghci> readFile $ uriPath uri
"The contents of the file in C:/some/path\n"
ghci>

$ cd D:/
$ cabal repl -b network-uri
...
ghci> readFile "/some/path"
*** Exception: /some/path: openFile: does not exist (No such file or directory)

Just by luck, paths starting with / are interpreted as paths in the current drive so if we just happen to want that it might work.

If no volume or drive letter is specified and the directory name begins with the directory separator character, the path is relative from the root of the current drive.

from https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats#example-ways-to-refer-to-the-same-file

The right parsing would be what file-uri does:

ghci> import System.URI.File
ghci> uri = fromRight undefined $ parseFileURI ExtendedWindows "file:///C:/some/path"
ghci> uri
FileURI {fileAuth = Nothing, filePath = "C:/some/path"}
ghci> readFile $ Data.ByteString.Char8.unpack $ filePath uri
"The contents of the file in C:/some/path\n"

Which results in a path accessible from everywhere in the system. However this is not the end of the story because:

  • file-uri only allows file: schemes, it will not allow file+noindex: uris.
  • The file uri RFC does not mention fragments, so file-uri does not parse those.

We can either define a parser for file+noindex from scratch, or do some tricks to parse the uri as a network-uri uri, then fabricate an intermediate uri with the file: scheme and use file-uri or something else. I just didn't want this finding to go missing again.

@jasagredo
Copy link
Collaborator Author

It probably doesn't make much sense fixing this before #8889 goes in.

@jasagredo
Copy link
Collaborator Author

jasagredo commented Jan 3, 2025

Also this PR fixes the issue but in a slightly hacky way MercuryTechnologies#3. It was dropped in the end from what was merged into master

@phadej
Copy link
Collaborator

phadej commented Jan 3, 2025

I don't think it make sense to support DOS filepaths. UNC paths could already work (I don't have windows machine to try).

@phadej
Copy link
Collaborator

phadej commented Jan 3, 2025

Also duplicate of #7065: looks like UNC paths do work!

@jasagredo
Copy link
Collaborator Author

Thanks @phadej. I can try with UNC paths next week. Not sure if they need to be normalised or one can open a path //?/C:/some/path

@jasagredo
Copy link
Collaborator Author

jasagredo commented Jan 7, 2025

@phadej It does not work with UNC paths:

ghci> printParsedFields "file+noindex:////?/C:/some/path"
Scheme: file+noindex:
Auth: Just (URIAuth {uriUserInfo = "", uriRegName = "", uriPort = ""})
Path: //
Query: ?/C:/some/path
Frag:

No matter which combination of / I use, it will consider the path part of the query. Also notice we would have to postprocess the filepath:

ghci> readFile "//?/C:/some/path"
*** Exception: //?/C:/some/path: openFile: does not exist (No such file or directory)
ghci> readFile $ normalise "//?/C:/some/path"
"The contents of the file in C:/some/path\n"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants