Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ipfs/go-bitswap] Integration between Graphsync and IPFS #85

Open
petar opened this issue Jun 24, 2021 · 6 comments
Open

[ipfs/go-bitswap] Integration between Graphsync and IPFS #85

petar opened this issue Jun 24, 2021 · 6 comments
Assignees
Labels
need/triage Needs initial labeling and prioritization

Comments

@petar
Copy link
Contributor

petar commented Jun 24, 2021

The goal is to enable bitswap to support different methods of fetching a block,
so that it can access non-bitswap sources like filecoin nodes which may use
graphsync (via https://github.com/filecoin-project/go-data-transfer) and eventually
other payment-based methods.

Fundamentally, Bitswap brokers information about which peers have a cid.
This is captured in the form (cid, peer_id). It is implied that the method of
fetching is the bitswap transfer protocol.

To generalize Bitswap, we need to change the information that is associated with a cid.
For each cid, we would like to keep track of multiple "routing expressions" each of which
describes a different method to fetch the block.

Routing expressions are expressions in the routing language syntax, which represent
valid descriptions of methods to fetch a block, according to the existing Routing Language Spec.

For instance,

     fetch(
          cid=link("Qm15"),
          proto=bitswap,
          providers=[multiaddr("/ip4/8.1.1.9:44")],
     )

or

     fetch(
          cid=link("Qm15"),
          proto=graphsync,
          graphsync_voucher=0x12ef78cd,
          providers=[multiaddr("/ip4/8.1.1.9:44")],
     )

In essence, the routing information brokered should be of the form (cid, list of routing expressions).

This entails changes to every part of bitswap that touches routing information (for cids):

  • The first (of two) entry points of new routing information is access to the DHT,
    which is abstracted behind the interface ProviderFinder. This interface has to be
    generalized accodingly, essentially to match the generic composable routing interface.
    This interface should also be moved to go-composable-routing repo (it does not belong in bitswap).

    • ProviderFinder is implemented by ProviderQueryManager, which acts
      as middleware between bitswap and making routing calls to the DHT, which
      adds throttling, dedup, batching. ProviderQueryManager must be:
      • generalized to use the composable routing interface (to make it middleware officially)
      • ideally broken down into independent middleware blocks (batching, throttling, dedup) which are chained
      • moved to go-composable-routing repo
  • The second (of two) entrypoints of new routing information is reception of "have" messages
    from the bitswap gossip protocol. On reception, the "have" information must be converted into
    a routing expression, so that it can be treated in the same manner as other routing information
    downstream.

  • The logic that reacts to new routing informatoin must also be updated.
    At the moment the only routing information that enters bitswap is "have" information,
    and it is acted on immediately by firing/queuing respective "want" requests.
    Going forward, routing information that corresponds to "have" messages can be treated as before.
    However, we need to decide how to schedule fetching from non-bitswap sources (like filecoin/graphsync)
    and generally how to prioritize/parallelize fetching from different sources (bitswap and non-bitswap).

Remarks
This is an absolute minimum plan to enable the integration. Going forward, a lot of additions can be made to improve the scale and speed of the routing process in bitswap. E.g. the "have" messages can be generalized to communicate multiple sources for a block, so that peers can share with each other knowledge about where else the block can be downloaded. E.g. "I have the block, but I also know that this filecoin miner has the block you want too, and they also have the entire directory where the block lives."

Related
IPFS / Filecoin interop plan: https://hackmd.io/JoZiAAtnTpqAKuQaEUra4g

PRs comprising the resolution of this issue
Step 1: ipfs/go-bitswap#512

Follow-up tasks
After this issue is resolved, the following (smaller) issues must be addressed before IPFS is fully ready to talk to the Golden Path product: ipfs/go-bitswap#509, https://github.com/ipfs/go-bitswap/issues/510

@petar petar added the need/triage Needs initial labeling and prioritization label Jun 24, 2021
@petar petar self-assigned this Jun 24, 2021
@welcome
Copy link

welcome bot commented Jun 24, 2021

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@Stebalien
Copy link
Member

This statement doesn't make a lot of sense. I assume you're referring to some form of meta exchange that can use both the bitswap protocol and graphsync?

To generalize Bitswap, we need to change the information that is associated with a cid. For each cid, we would like to keep track of multiple "routing expressions" each of which describes a different method to fetch the block.

This could use a lot of motivation. I'd expect the flow to be:

  1. I find out who has what. This is a mapping of CID -> PID.
  2. I connect to peers, then request content via whatever protocol they support.

Of course, I might want additional information before I bother to make a connection. For example:

  • Supported protocols.
  • Pricing.
  • Maybe vouchers? Really, more like arbitrary "tokens". Effectively "curried" arguments.

But then I'd expect the record to look more like:

{
    provider: PeerID,
    content: cid,
    protocols: {
        "/ipfs/bitswap/1.1.0": {...},
        "/ipfs/graphsync/1.0.0": {"token": ..., "price": ....}, // needs to specify across payment systems.
    }
}

Eventually, "queries" could be extended to select things like "supports graphsync but charges less than X".

@petar
Copy link
Contributor Author

petar commented Jul 8, 2021

@Stebalien:

I find out who has what. This is a mapping of CID -> PID.

This is how things work today. We'd like to generalize this significantly. A source for a CID's content need not be a peer at all. For instance, it could be a legacy FTP service at a given IP, or a Bittorrent link (which doesn't even refer to a specific host). A routing expression can describe any such method.

{
    provider: PeerID,
    content: cid,
    protocols: {
        "/ipfs/bitswap/1.1.0": {...},
        "/ipfs/graphsync/1.0.0": {"token": ..., "price": ....}, // needs to specify across payment systems.
    }
}

Since discoverable sources for a CID may be heterogenous (e.g. a peer using bitswap, filecoin miner using graphsync, github repo at a given commit, etc), each CID is associated with a list of routing expressions, each of which describes some individual source. This is in contrast to having a single CID record (as the one above) that tries to describe all sources.

@Stebalien
Copy link
Member

Ah, I see. Yeah, that makes a lot of sense. So we'd have an engine on-top-of-bitswap handling the generalized content routing records, passing information into each protocol.

Since discoverable sources for a CID may be heterogenous (e.g. a peer using bitswap, filecoin miner using graphsync, github repo at a given commit, etc), each CID is associated with a list of routing expressions, each of which describes some individual source. This is in contrast to having a single CID record (as the one above) that tries to describe all sources.

Makes sense.

@hannahhoward
Copy link
Contributor

hannahhoward commented Jul 9, 2021

My recommendation is to do ipfs/go-bitswap#512 to abstract the content routing source, add the ability to talk to indexers once they exist, and stop till we understand the direction we're heading.

As I see it, there are two paths to Golden Path in IPFS:

  • Just lean into bitswap, minimal changes -- get miners to turn on bitswap in their markets process, backed by a blockstore that reads from unsealed pieces using the miner index, serve only free data, do the minimum amount of work to get Bitswap to talk to indexers as well as the DHT, call it a day. Once global indexes and miner indexes exist, that's a pretty short project -- maybe 2-3 months. Gets you to Golden Path free retrievals in go-ipfs, with any miner that will actually leave bitswap on (I'm not sure how many there are of these). I always call this the @alanshaw solution cause he originally proposed it.

  • Actually get go-ipfs to switch protocols between Bitswap and graphsync, speak data transfer, do free and paid retrievals, etc. And I have some strong opinions on it:

    • the go-bitswap library should get SMALLER, not larger -- it's already a beast. IMHO, go-bitswap should become a Bitswap protocol implementation, not a generalized content fetching implementation. The longer we keep saying "let's just throw more stuff in go-bitswap" that has nothing to do with the bitswap protocol -- even speaking other protocols like Graphsync -- honestly the more confusion we create about what a "bitswap" is. (keep in mind, JS bitswap doesn't even have sessions) I think: Sessions, content-routing, etc needs to move up into some kind of meta library.
    • Actually implementing this protocol mixing and routing mixing needs a bunch of people's input. I've worked on all these libraries for years and authored the first implementations of some of them. I still don't think I know the absolute best way to do it. There are so many questions:
      • What's the unit of transfer above a block -- is it a data transfer? Is it a DAG? Is it analogous to a Session (which can be lots of DAG/Block requests, related simply by programmer choice)?
      • Also, what's the hierarchy of moving pieces in terms of libraries? I have thought at times go-data-transfer is the "meta" library above bitswap and graphsync that we're talking about, but currently it has no routing. Maybe it is something else.
      • I really LIKE in this proposal is the idea of a universal routing stream. It's actually pretty great IMHO. One thing I've thought about a lot is that we should add the equivalent of WANT-HAVE to graphsync -- that would say -- do you have this DAG? The response could be Yes/No but also the yes could also include the CIDS without the blocks. I outlined this proposal here: RFC|BB|L2: Speed up Bitswap with GraphSync CID only query protocol/beyond-bitswap#25 & here Flesh out Protocol with Peer and Metadata Requests, Specify block deduplication logic ipld/specs#355
      • I really like @raulk 's idea of a universal per-request event bus that all layers communicate on. It think there's a kernel of an idea for how all the things can come together there.
      • long and short it's a big hard problem and we ought to have a team of us thinking and researching an approach together if we go this path.

So my take is: do the part that is needed for either approach and stop. There's no progress to be made for real until miner indexes actually exist anyway. Especially if we do the great Web3 future data transfer stack refactor, we need a wide set of folks working on it. If we want to do something further, I would allocate a team of folks with deep experience in our data transfer protocols and content routing to do planning for how to actually refactor our libraries top to bottom to deliver on the needs for mixing filecoin and IPFS. This would at least help us determine how much work we're actually talking about, and when we could realistically deliver it.

@Stebalien Stebalien changed the title Integration between Bitswap and IPFS Integration between Graphsync and IPFS Jul 30, 2021
@Jorropo Jorropo changed the title Integration between Graphsync and IPFS [ipfs/go-bitswap] Integration between Graphsync and IPFS Jan 27, 2023
@Jorropo Jorropo transferred this issue from ipfs/go-bitswap Jan 27, 2023
@Martingoodnews
Copy link

Definitely one-of-a-kind, I found this blog to be extremely helpful. Continue your fantastic work.
Come and be part of the Filecoin Orbit Mixer a vibrant event where blockchain enthusiasts, developers, and industry leaders come together to explore endless possibilities.
My blog contains additional information. Visit to find out more about it. Don't pass up this opportunity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

4 participants