Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent path parsing from url_to_fs #1722

Open
jhamman opened this issue Oct 13, 2024 · 2 comments
Open

inconsistent path parsing from url_to_fs #1722

jhamman opened this issue Oct 13, 2024 · 2 comments

Comments

@jhamman
Copy link

jhamman commented Oct 13, 2024

fsspec.url_to_fs seems to be inconsistently parsing the path from urls.

import fsspec

print(fsspec.url_to_fs("s3://icechunk-test/ryan"))
print(fsspec.url_to_fs("http://earthmover.io/joe"))

(<s3fs.core.S3FileSystem object at 0x1334187a0>, 'icechunk-test/ryan')
(<fsspec.implementations.http.HTTPFileSystem object at 0x133419550>, 'http://earthmover.io/joe')

Why does the path from the http example include the scheme?

@martindurant
Copy link
Member

Why does the path from the http example include the scheme?

The HTTP implementation deals transparently with http and https on the same client and connection pool. The two types or URL are only distinguishable by their protocol, and the lower-level client needs to see the whole URL to make the right call.

Conversely, s3/s3a and gs/gcs are allowed prefix aliases, but the backend doesn't use the prefix at all in the actual call to the remote store.

It might be reasonable for a backend, let's use s3 as an example, to remember that it was created with protocol "s3", and return paths as "s3://..." even when the path passed in was "s3a://..." (and vice-versa). However, this would mean a decent amount of rewriting.

Note that fs.unstrip_protocol should make full URLs.

@jhamman
Copy link
Author

jhamman commented Oct 14, 2024

I think I see where you are coming from. From a users perspective though, its a bummer to have to special case the output of url_to_fs differently for the HTTP filesystems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants