Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPNI Reverse Index #1781

Open
bajtos opened this issue Aug 5, 2024 · 0 comments
Open

IPNI Reverse Index #1781

bajtos opened this issue Aug 5, 2024 · 0 comments
Assignees

Comments

@bajtos
Copy link

bajtos commented Aug 5, 2024

Open Grant Proposal: IPNI Reverse Index

Project Name: IPNI Reverse Index

Proposal Category: Developer and data tooling

Individual or Entity Name:

  • Space Meridian
  • IPNI Cell
  • Curio

Proposer: @bajtos

Project Repo(s)

(Optional) Filecoin ecosystem affiliations:
People who will implement these changes have nucleated from Protocol Labs and are working for new companies now.

(Optional) Technical Sponsor: @willscott

Do you agree to open source all work you do on behalf of this RFP under the MIT/Apache-2 dual-license?: Yes

Project Summary

In Filecoin, the main unit for storing user data is Piece, as identified by Piece CID (see Filecoin Spec). On the other hand, the data retrieval process operates at the payload level. The client requests data using a payload CID and receives back the IPLD DAG of the payload.

To drive improvements in availability of Filecoin content retrieval, we need to measure the quality of the retrieval service provided by Storage Providers. The on-chain state, events and history provide only the PieceCID information about stored data. Retrieval probes need to map PieceCID to PayloadCIDs to check if the content can be retrieved. There is no straightforward solution for such mapping right now.

This project aims to enable retrieval probes to query IPNI to obtain a sample of Payload CIDs advertised by a given Storage Provider for a given deal (PieceCID).

The project will require changes in the IPNI indexer implementation (storetheindex, cid.contact), and the index provider implemented by Curio.

See the following document for more information: https://docs.google.com/document/d/1jhvP48ccUltmCr4xmquTnbwfTSD7LbO1i1OVil04T2w

Impact

The on-chain state, events, and history only provide the PieceCID information about stored data. Retrieval probes need to map PieceCID to PayloadCIDs to probe for retrievability. There is no straightforward solution for such mapping right now.

If we get this right, we will empower developers to build alternative retrieval-probing networks, new reputation systems, and an array of diagnostic tooling.

If we don’t get this right or make no improvements, building a retrieval probe will remain a technical challenge that requires deep knowledge of Filecoin actors, on-chain state, and the IPNI advertisement protocol. It will be unlikely that alternative retrieval-probing networks emerge.

When this project is successful, Spark - the retrieval-probing network powered by Filecoin Stations - will be able to test the retrieval of deals using the recently introduced DirectDataOnboarding. Such probing will drive further improvements in the retrieval success rate of FIL+ deals. If very successful, then there will be at least one other retrieval-probing network using this reverse index and creating healthy diversity & competition in the ecosystem of retrieval probing.

UPDATE 2024-10-02

After submitting the application, I learned that Boost will soon be deprecated and replaced by Curio. Curio will not support Graphsync and StorageMarket deals; it will only support Trustless HTTP GW retrievals and DDO deals. This will make it virtually impossible for third parties (e.g. retrieval checkers like Spark) to find which payload CIDs are stored in FIL+ deals made with Curio.

Outcomes

  1. When Curio advertises data to IPNI, it does so in a way that enables third parties like Spark to link Filecoin deals to payload blocks.
  2. IPNI at cid.contact provides a new REST API endpoint for sampling payload blocks linked to a Filecoin deal.
  3. A specification or documentation allowing alternative provider implementations like Venus to implement the same mechanism.

Important

  • The reverse index for existing IPNI records is out of the scope of this proposal and grant, the new service will be offered only for advertisements announced after the reverse index implementation was deployed. (We may apply for another grant if we find that we need to support historical deals too.)
  • Modifying Boost if out of the scope. The changes in how SPs advertise records to IPNI will be implemented only in the new Curio project.

The desired end-to-end workflow from user’s perspective:

  1. A Piece is added to a Curio instance running a publicly released version of Curio.
  2. Curio announces payload blocks included in that Piece to IPNI. (This happens automatically in the background.)
  3. A client queries cid.contact to obtain a sample of the payload blocks from the Piece.

Please refer to the design doc for more details:
https://docs.google.com/document/d/1jhvP48ccUltmCr4xmquTnbwfTSD7LbO1i1OVil04T2w/

How to measure the success:

  • The end goal is to have all Storage Providers correctly advertise all payload blocks from new (FIL+) deals to IPNI so that Spark can query IPNI’s reverse index to map storage deals to payload block CIDs.
  • Spark will track this metric in a public dashboard alongside other metrics like the retrieval success rate. Building such a dashboard is out of the scope & deliverables of this proposal and grant, though.
  • As a partial/proxy metric, we can measure how many Storage Providers are running the Curio/Venus version that includes the features from this project. Building such a dashboard is out of the scope of this proposal and grant, though.

Adoption, Reach, and Growth Strategies

At the high level, our target audience consists of all storage providers. We want them to adopt the latest Curio version and configure it to correctly advertise to IPNI.

From another perspective, our target audience is the builders community that may want to build an alternative retrieval-probing network, new reputation systems, new diagnostic tooling, or perhaps use the new reverse index for use cases we cannot imagine yet.

To streamline the adoption, we are including documentation updates as part of this project.

Development Roadmap

Milestone 1: Design Spec

Deliverables:

  • Specify how the new reverse index will work from the client's perspective and how it will be implemented in IPNI.
  • Important: the reverse index will be built only for advertisements announced after the reverse index implementation is deployed. Support for historical advertisements is out of scope.
  • Specify how Curio (and other provider implementations like Venus) must map DDO deal metadata like PieceCID & PieceSize to ContextID for the IPNI advertisement.
  • Specify the new REST API for sampling payload blocks (querying the reverse index). (See e.g. the xedni proposal.)
  • The specification must include enough information to serve as the documentation for the following two use cases:
    • How should provider software like Venus implement the index provider to enable reverse index lookups.
    • How can users query the reverse index.
  • Get alignment on the proposal with all stakeholders - IPNI, Curio, Spark/Space Meridian.

Out of scope:

  • A migration path for existing Boost deployments that will add historical deals to the reverse index.
  • An upgrade path for IPNI records that were ingested before the reverse index was implemented & deployed.

Planning:

  • 1-3 people (IPNI developer to lead the work, Curio developer & Spark developer to provide domain-specific expertise & feedback)
  • Estimated effort: 4-5 weeks

Milestone 2: IPNI Implementation

Deliverables:

  • At the end of this milestone, the design specified in Milestone 1 is implemented in IPNI and deployed to cid.contact.
  • IPNI builds the reverse index using the ContexID as the key.
  • IPNI and cid.contact provide a new API endpoint allowing clients to sample the payload block CIDs associated with a given (ProviderID, ContextID).
  • The reverse index is deployed across 2 geo locations, at minimum.

Budget:

  • 1 person (IPNI developer)
  • Estimated effort: 2 to 2.5 months

Milestone 3: Curio Implementation

Deliverables:

  • At the end of this milestone, all changes required to enable the reverse index are included in a GA release of Curio, ready to be picked up by Storage Providers.
  • When a new DDO deal is activated via Curio, Curio builds ContextID from PieceCID & PieceSize and advertises deal payload blocks under this ContextID to IPNI.
  • Documentation for Curio operators explaining how to configure their deployment to advertise new deals in such a way that IPNI includes these deals in the reverse index and clients like Spark can build the ContextID from PieceCID and PieceSize.
  • A migration path from Boost to Curio that would un-announce all existing DDO deals and re-announce them using the new ContextID.

Budget:

  • 1 person (Curio developer)
  • Estimated effort: 3 weeks

Total Budget Requested

Will send to [email protected]

Maintenance and Upgrade Plans

We expect the IPNI and Curio maintainers to maintain these new features as part of their existing maintenance work arrangements.

Team

Team Members

  • @masih - Masih Derkani (IPNI)
  • @LexLuthr - Lex Luthr and other members of the Curio team
  • @bajtos - Miroslav Bajtos (Spark/Space Meridian)

Team Member LinkedIn Profiles

Team Website

https://filspark.com/

Relevant Experience

  • We (Spark/Space Meridian) want to fund the IPNI and Boost developers to develop these features for Spark and the wider community.
  • Masih is the author and lead contributor to IPNI.
  • Lex is the most active contributor to Boost & Curio.

Team code repositories

n/a

Additional Information

You can find us all in the Filecoin Slack workspace.

The best email address for discussing the next steps: [email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants