-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use fsspec simplecache #323
base: master
Are you sure you want to change the base?
Conversation
Comprehensive test suite FTW! The test failure is due to HTTP credentials not being passed through via fsspec_open_kwargs. Here is a minimal reproducer. import aiohttp
import fsspec
url = "http://httpbin.org/basic-auth/foo/bar"
auth = aiohttp.BasicAuth(login='foo', password='bar')
with fsspec.open(url, auth=auth) as fp:
data = fp.read() # works
with fsspec.open("simplecache::" + url, auth=auth) as fp:
data = fp.read() # ClientResponseError @martindurant - would you consider this an fsspec issue? Or is there a way to pass the credentials through? |
Should be with fsspec.open("simplecache::" + url, http=dict(auth=auth)) as fp:
data = fp.read() # ClientResponseError because the URL now has two components, and fsspec needs to know which of these to send the kwargs to. |
Great, thanks for chiming in! The problem is that in Pangeo Forge (in contrast to the toy example I shared), we don't necessary know that it's an http link. It could be ftp, or anything else. We could try to parse that out, but that feels fragile. Is there a different way of invoking the simplecache? |
Ah, I see. The system isn't designed for passing arbitrary arguments to "the second filesystem", but I'll have a think about what can be done. fsspec uses the following to parse the URL pieces: x = re.compile(".*[^a-z]+.*")
bits = (
[p if "://" in p or x.match(p) else p + "://" for p in path.split("::")]
if "::" in path
else [path]
) |
I suppose you could do the following: In [16]: fs, _ = fsspec.core.url_to_fs(url, auth=auth)
In [17]: fs2 = fsspec.filesystem("simplecache", fs=fs)
In [18]: with fs2.open(url) as f:
...: print(f.read())
...:
b'{\n "authenticated": true, \n "user": "foo"\n}\n' |
What about this? fs, _, paths = fsspec.get_fs_token_paths(url, storage_options={'auth': auth})
cache_fs = fsspec.implementations.cached.CachingFileSystem(fs=fs)
with cache_fs.open(paths[0]) as fp:
data = fp.read() |
I'm pretty sure that amounts to the same thing. |
This PR would get rid of our custom local file copying logic in favor of fsspec's simplecache mecahnism.
After filing intake/intake-xarray#116 I realized how easy it should be. @martindurant has been saying this for a while. 🙃