-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NOAA OISST Zarr is now on IPFS - next steps w/ Filecoin? #40
Comments
I imagine the majority of the time is spent fetching the time chunks, so I, II or both would make a big difference. |
Yes @martindurant, most of the time is spent during fetching of the One could simulate what a proper async implementation would do by running e.g.
which runs within 3 seconds on my laptop. Of course, referring directly via the https link defeats the purpose of IPFS, which is to become independent of location addressing. So the proper way will in any case be to have some (or many) gateway / IPFS node(s) close to where the data is used. If I run the async ipfsspec implementation on my laptop (which has an IPFS node running), I can open the same dataset at 500 ms. As the async implementation currently doesn't have fallback / load balancing, I'm hesitant to release it as the standard ipfsspec. When I run the sync variant of ipfsspec against my local node, coincidentally the speed goes down to 3 s as well. I've created another issue at ipfsspec to discuss about particular design decisions of ipfsspec. |
The issue with why IPFSSPEC is readonly is because ZARR needs to know the keys before it starts writing, i.e. we need to figure out a way to generate the CIDs before creating the DAG (Direct Acyclic Graph). I wanted to break down the issues with IPFS node + gateway as there are two layers |
@sheriflouis-FF regarding the pinning. I think for the particular case of ZARR, the finest level accessed directly via CID (i.e. without any path) might well be the level of an zarr array (i.e. a folder containing a file called If at some point in time we would be able to trace the chunk CIDs through computation (as briefly mentioned here), this might however change. |
Thanks to @sheriflouis-FF, @jnthnvctr, and @d70-t, the NOAA OISST Zarr store is now on IPFS, and openable with xarray:
https://gist.github.com/cisaacstern/de5b5d0a17bc3dadb372997f43e79a42
A few notes:
This dataset was copied to IPFS from our Open Storage Network (OSN) S3 bucket. IIUC, Tobias's
ipfsspec
is currently read-only so there is not a direct path currently for writing directly to IPFS frompangeo-forge-recipes
.The opening time of 4+ minutes is obviously slow, and could be accelerated by:
ipfsspec
gives preference to a local gateway, and falls back to a remote gateway if a local one is not found. I am still unclear what the recommended simplest method is for running a local gateway.time
dimension. The Zarr store from which this was copied was written to OSN prior to Consolidate dimension coordinates pangeo-forge-recipes#210, which I believes resolves this issue.ipfsspec
which would also help.Now that we have a minimal working example on the read-only side, I'm opening this issue to solicit input from @pangeo-forge/dev-team as well as the IPFS crew (please tag others if I've missed someone!) regarding next steps with his project. What milestones should we focus on?
The text was updated successfully, but these errors were encountered: