-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: is one graphsync.ResponseProgress message per Node in a graph reasonable? #326
Comments
In terms of user API expectation, I could imagine one of these two scenarios:
|
Yea. In short, I agree... and I'm not sure where we should head with this to start. The original thought was: Graphsync is a protocol that works with IPLD graphs, it should return IPLD data, in the new format that supports selectors, aka go-ipld-prime nodes. The problem is under the hood Graphsync is basically about moving blocks around, and it's not entirely clear how useful the "over the hood" part of actual IPLD data turns out to be. Plus the distinction between Explore/Matching wasn't well defined when we first created go-graphsync and as you point out, we don't respect Matching in limiting returns results. My initial inclination is just to support Matching properly and respect it when returning results, because it's pretty easy to do and doesn't change the API. I feel like "match only on block boundary" is best done in the world of IPLD / Selectors? I'm not sure. Plus its trivial to write a filter and we've done that in the past so maybe that's the easiest. My main goal would be: don't change the return types. If we can make that work, we should be good. |
Yeah, this is the next step I think. I've been suggesting that ipld-prime's traversal should have better affordances for blocks since it's what the majority of current use-cases use it for and we're forced to deal with the loader for most of that for now, which is a bit clunky in terms of API clarity. |
avoid an unneccesary serialization roundtrip when sending requests for selectors
A
Request()
returns a channel that gives you all of the Nodes of a DAG traversal, rather than the Blocks of that traversal.In datatransfer, I see that it’s just doing a
for range req.responseChan {}
so they’re not used, I don’t know where else they might be consumed though.But for a DAG with very complex blocks (like the filecoin chain, for example), this means that if someone uses a matching selector, they could trigger the creation of a
graphsync.ResponseProgress
for every single point of data (Node) within each of the blocks.So, for example, running the plain blockchain used in the test cases, with 99 blocks, each of which is just
{Parents:[CID],Messages:[bytes,...]}
. Even with this very simple block format and a selector that limits itself to exploringParents
, each block generates two progress messages, one for visiting theParents
and one for visiting the first element of the list. So a 99 block chain, with this super-simple structure, and a selector that’s limiting itself to justParents
generates 198 messages.Since we use
WalkAdv
it doesn’t even matter if it’s a matching or explore selector, it’ll trigger this for every single data element within a block.Which goes to some of the concerns I’ve been raising recently about how we use selectors practically and the distinction between Nodes and Blocks getting in the way—what would the user of this API expect from a
Request()
? Is it reasonable that they get an exhaustive list of messages for each Node, or would they more logically expect one for each Block? Or maybe neither since we’re really just using it for a block store. And in terms of performance, what are the implications of having so many messages per block. Even for DAG-PB blocks, we’re going to be triggering over 100 messages for a single block in many cases (e.g. MFS or just dense directory blocks).The text was updated successfully, but these errors were encountered: