You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think that the interface regarding path inputs should be more consistent(ly defined).
While some file systems do accept simple paths such as / or /folder, others do not, e.g., for HTTPFileSystem, I have to specify the whole URL including the port again and again for each access, e.g., result = fs.ls("http://127.0.0.1:8000").
I would have expected that I could use the name returned by ls calls to be usable for the next ls or stat call, but some file systems, again HTTP (but also WebDAV, maybe all HTTP based ones), may or may not require the name to further be escaped. However, simply escaping all paths also does not work because that would result in 404 errors in other file systems, I believe
Some file systems return full paths in ls, some relative paths. I don't think this is specified sufficiently.
E.g., consider this case where ls even returns some mix of escaped and unescaped symbols, which seems to be clearly a bug:
importpprintimporturllibfromfsspec.implementations.httpimportHTTPFileSystemasHFSurl="http://127.0.0.1:8000"fs=HFS(url)
# What I would have expected to work:# result = fs.ls("/")results=fs.ls(url)
pprint.pprint(results)
print()
forresultinresults:
if'/?'notinresult['name']:
# Neither do work# pprint.pprint(fs.stat(urllib.parse.quote(result['name'])))pprint.pprint(fs.stat(result['name']))
The # is escaped as %23 but ? is not escaped leading to the observed FileNotFoundError.
Using urllib.parse.quote also does not work because it would escape % again even after going through the trouble of parsing the whole URL because we do not want to escape the special characters in http://127.0.0.1:8000 only those in the path. Even this would result in %2523not-a-good-name%21%3F, which also leads to a FileNotFoundError.
HTTPFileSystem returning those sorting actions such as ?N=D is another matter altogether, but because it parses generated HTML listing from arbitrary HTTP servers, I am not surprised that this is not stable and might be impossible to fix for all cases.
for HTTPFileSystem, I have to specify the whole URL
HTTP is probably the only case of this, and the reason is, that the filesystem is responsible for both "http" and "https", but knowing which is imperative for getting data back. This is unfortunate...
It might be reasonable to have a separate http and https FS instance, each of which can only respond to the given protocol, and only return ls() results for links of the same protocol (the same_scheme= argument already exists on HTTPFileSystem, we would have it always True).
Another thought, is the have the "in FS name" and "canonical name" be separate fields in ls()/find()/info(). The question becomes:
which to return in functions with detail=False
whether to construct full chained URLs for compound FSs like cache, reference, dirFS.
I think that the interface regarding path inputs should be more consistent(ly defined).
/
or/folder
, others do not, e.g., for HTTPFileSystem, I have to specify the whole URL including the port again and again for each access, e.g.,result = fs.ls("http://127.0.0.1:8000")
.name
returned byls
calls to be usable for the nextls
orstat
call, but some file systems, again HTTP (but also WebDAV, maybe all HTTP based ones), may or may not require the name to further be escaped. However, simply escaping all paths also does not work because that would result in 404 errors in other file systems, I believeE.g., consider this case where
ls
even returns some mix of escaped and unescaped symbols, which seems to be clearly a bug:Server set up with:
Output:
The
#
is escaped as%23
but?
is not escaped leading to the observedFileNotFoundError
.Using
urllib.parse.quote
also does not work because it would escape%
again even after going through the trouble of parsing the whole URL because we do not want to escape the special characters inhttp://127.0.0.1:8000
only those in the path. Even this would result in%2523not-a-good-name%21%3F
, which also leads to aFileNotFoundError
.HTTPFileSystem
returning those sorting actions such as?N=D
is another matter altogether, but because it parses generated HTML listing from arbitrary HTTP servers, I am not surprised that this is not stable and might be impossible to fix for all cases.Possibly related issues:
file://
URLs) #1168The text was updated successfully, but these errors were encountered: