-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roadmap: what should ipfsspec do? #7
Comments
I'm interested in this project, here are my thoughts on the questions:
With gateways only? No, not really. Gateways are slow and unreliable. They are a fine fallback for content that the gateways have easy access to, but if I'm hosting data on my home network with a bad upload speed and I try to access it from a gateway, it often times out.
No, not yet. Gateways are a crutch, but a useful one. Allow them to exist as a fallback, but I think the best coarse of action would be to utilize an installed ipfs implementation such as kubo as the primary method of accessing data. This would require a Python wrapper library around kubo that abstracts requests to it, is ideally duck-typed with a gateway version of the abstraction, and then both of them can be used to implement the fsspec hooks.
Not until read support is very good. Pinning to ipfs has a lot of nuances that users will have various ways of performing. What this library should focus on is being able to access data already pinned on IPFS as efficiently as possible. In terms of future write support, I think fsspec needs to expand its API to embrace the idea of content addressable data first. I think such a proposal is in-scope of the project and something that they could be convinced is a good idea. |
Thank a lot for this feedback 🎉 . This repo has been relatively quiet for a while (back then, I guess it's been go-ipfs 0.12.0), but I hope that things could slowly ramp up again.
I'm not sure if I'm understanding the same as you in this point: I consider a locally running kubo instance as a Gateway (from So far, my understanding would be, having only one long-running IFPS node on a machine is better than having multiple short running ones (due to larger pinset and less impact of startup time). Thus, it could actually be better to talk to the local kubo "gateway" than having a full-blown IPFS protocol stack inside Python.
This is a tricky business: at some point, there has to be a decision which gateway to use. This could be
So as far as my current understanding goes, there's either manual configuration or load balancing, if we want to have public gateways as a fallback option (and I know a couple of users which rely on that fallback option).
I agree. Maybe we even want a different kind of library for write support. |
This issue is meant to discuss the purpose of the ipfsspec fsspec backend and to sharpen the overall design.
background
Due to the availability of IPFS -> HTTP gateways, a specialized IPFS backend for
fsspec
based read access is not required, as it is possible to open any CID using the http backend by accessingthe downside of this approach is, that this requires to transform from content-based addressing to location-based addressing in user code. Using gateway-aware urls in user code makes it harder
To overcome these downsides, it seems to be beneficial to refer to IPFS resources via a gateway-unaware url like
and do the translation to HTTP or IPFS when accessing the resource and based on the local computing environment and settings. This was the initial idea of ipfsspec.
design questions
Is such a library useful at all?
Or should this translation be implemented on a different layer?
Should this library do automatic load balancing / fallback between multiple gateways?
async
).Should the library provide write support?
... and if yes, how?
IPFS is a content addressable storage, thus one can not choose the filename when adding content. In stead, the "filename" is computed based on the stored content. As a result, the signature of a
put
function would rather look likein stead of
and thus wouldn't directly fit into
fsspec
.A way out might be to use the IPFS mutable filesystem, which adds a local mutable overlay on top of the immutable filesystem. Using MFS it would be possible to incrementally construct a local filesystem hierarchy and ask for a root CID after construction has finished. The downside of this approach is, that this only works locally (or at least local to one gateway) and thus is probably not suited for larger datasets. So there's probably not too much benefit as compared to writing data into a local temporary folder and than
ipfs add -r -H
the entire folder.A related option might be to pin data blocks one by one and keep the virtual directory in memory. After writing out a larger dataset this way, a root CID for remotely stored datasets could be created. An advantage of this approach might be, that writing could be distributed to multiple remote gateways.
The text was updated successfully, but these errors were encountered: