-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support symlinks and other types in ls #27
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Erotemic for bringing this in 👍.
Apart from the specific comments, please note that generally the async implementation of ipfsspec
should be preferred, and probably the synchronous implementation should be dropped at some point (it's relatively easy to use the async implementation in a synchronous setting, but not the other way around. Further, the async implementation should in principle perform better).
Documentation for the REST API can be found here: [Kubo_RPC_API]_. | ||
|
||
References: | ||
.. [Kubo_RPC_API] http://docs.ipfs.tech.ipns.localhost:8080/reference/kubo/rpc/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess as long as we are talking read only, we might want to stick to the HTTP Gateway API (not the full RPC API). This should always be available on locally running kubo installations, but should work equally well on public gateways as a fallback.
types = { | ||
0: "raw", | ||
1: "directory", | ||
2: "file", | ||
3: "metadata", | ||
4: "symlink", | ||
5: "shard", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also touch what the generic fsspec
API expects as a return form ls
(and info
). I don't know a definitve answer here, but maybe @martindurant could chime in. If I look at e.g. the local filesystem implementation, there are only types file
, directory
and other
. There's additionally the islink
flag.
So @martindurant, would raw
, metadata
, symlink
and shard
be valid return values for the type
? And more specifically, what would be the best return value to communicate the presence of a symlink?
Co-authored-by: Tobias Kölling <[email protected]>
Can you give me an example of how its easy to use async in non-async code? From what I understand you have to create and manage an event loop, which is a lot more code overhead than I want. Unless I'm missing something and it is a lot easier to use the async variant than I think it is, then I would strongly suggest not dropping the synchronous variant. It's important for immediate feedback when people are working interactively in IPython / Jupyter settings. Also I believe doctests are easier to write with sync code (you don't want to have two lines of setup loop / execute loop boilerplate in a doctest if you just want to access one file). |
Ok, maybe the formulation wasn't ideal. To make async code sync, you "just" have to run an event loop and wait for the async code to complete. This has relatively little overhead. On the other hand, if you want to make sync code async, you'll have to create a new thread for every task you start which is much less efficient, especially if you're doing I/O and want to fetch 100s of requests concurrently. But I agree, there's a bit of boilerplate if you want to make async code look like sync code. Fortunately, One additional benefit you'll get from the synchronous-from-asynchronous API is, that An example would be
or, of course, the |
Ah, I see. Thank you for the clarification. I can just use: import fsspec
fs_cls = fsspec.get_filesystem_class('ipfs')
fs = fs_cls(asynchronous=False)
results = fs.ls("bafybeief7tmoarwmd26b2petx7crtvdnz6ucccek5wpwxwdvfydanfukna")
print(results) And that's still using the async implementation. Given that I'm perfectly comfortable with avoiding / dropping the synchronous version. And the async implementation with asynchronous=True would look like this: async def test_ipfs_async():
import fsspec
fs_cls = fsspec.get_filesystem_class('ipfs')
fs = fs_cls(asynchronous=True)
session = await fs.set_session() # creates client
result = await fs._ls("bafybeief7tmoarwmd26b2petx7crtvdnz6ucccek5wpwxwdvfydanfukna")
print(result)
await session.close() # explicit destructor
if __name__ == '__main__':
import asyncio
asyncio.run(test_ipfs_async()) |
Fixes #26
Adds some documentation to reference where to learn more about the gateway rest calls, and fixes an issue where ls would fail when a directory contained symlinks or non directory non file data.
I'm not sure what raw or shards are, but ls will list them now. I exepct other parts of the library will have issues with dealing with theses as well.