Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return of info (FileInfo) is unspecified, need consistent way to detect get link information #1680

Open
mxmlnkn opened this issue Sep 19, 2024 · 3 comments

Comments

@mxmlnkn
Copy link
Contributor

mxmlnkn commented Sep 19, 2024

The API specification for listdir and by inference also info reads:

The specific keys, or perhaps a FileInfo class, or similar, is TBD, but must be consistent across implementations. Must include:

I don't really understand the comment. "is TBD, but must be consistent across implementations" is an oxymoron. How can it be consistent between implementations when it is not specified yet? In order to increase the usefulness of the filesystem_spec API, this should be specified, imo.

Consider for example this:

import pprint
o = fsspec.open("ssh://127.0.0.1")
ol = fsspec.open("/")
print("Info of ssh:///bin")
pprint.pprint([x for x in o.fs.listdir('/', detail=True) if x['name'] == '/bin'][0])
print("Info of /bin")
pprint.pprint([x for x in ol.fs.listdir('/', detail=True) if x['name'] == '/bin'][0])

Output:

Info of ssh:///bin
{'gid': 0,
 'mtime': datetime.datetime(2021, 6, 21, 9, 52, 39, tzinfo=datetime.timezone.utc),
 'name': '/bin',
 'size': 7,
 'time': datetime.datetime(2024, 9, 19, 7, 14, 44, tzinfo=datetime.timezone.utc),
 'type': 'link',
 'uid': 0}

Info of /bin
{'created': 1601554848.8334851,
 'destination': 'usr/bin',
 'gid': 0,
 'ino': 13,
 'islink': True,
 'mode': 41471,
 'mtime': 1601554848.8334851,
 'name': '/bin',
 'nlink': 1,
 'size': 147456,
 'type': 'other',
 'uid': 0}

This is not consistent between these only two tested implementations:

  • mtime is datetime in the SSH backend, but a float for the local interface. This is already mentioned in Add ctime/mtime to list of expected values in info #526, and probably should be a nanoseconds integer.
  • The SSH implementation returns type: 'link' while the local one returns type: 'other' and another islink key.
  • I need some reliable way to detect file/folder/link and also get the link target, which does not seem to be returned by either of these two implementations. I guess, this could be a separate issue?

The other implementations should also be tested. There is a nice comprehensive overview here. Code for all, but concrete examples are missing for some.

@martindurant
Copy link
Member

Perhaps it is poorly written, but this means:

  • all implementations should return a dict with at least keys name, size, type.
  • other keys are allowed and vary by backend
  • we would like to have more constancy between backends, such as what we name timestamps and how they are formatted, but this work has not yet been done. In order not to break existing usage, standard keys should be added and not conflict with existing ones unless already identical.
  • we wish to keep open the possibility of formalising the structure returned by ls/info, but nothing has been done in that regard yet.

@mxmlnkn
Copy link
Contributor Author

mxmlnkn commented Oct 4, 2024

The HTTP file system is also inconsistent in regards to requiring the full URL specification for each listdir, open, etc. call. This is in stark contrast to the other implementations. See ray-project/ray#26423

Furthermore, some implementations return the name with leading / (fsspec.implementations.ftp.FTPFileSystem, sshfs.SSHF), some without (fsspec.implementations.git.GitFileSystem), which was another source of bugs for my wrapper. I am surprised that I have not encountered a filesystem yet that returns simply the file name just as the name key implies instead of the absolute path, but I still have 4+ other fsspec implementations that I still need to test and "implement" ...

@martindurant
Copy link
Member

(ref #1713)

some implementations return the name with leading /

Implementations have a root_marker class attribute that is typically "" or "/" to distinguish this behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants