Graphsync Simplifcation Idea: Unblocking the ResponseManager #251
Labels
effort/days
Estimated to take multiple days, but less than a week
exp/expert
Having worked on the specific codebase is important
kind/architecture
Core architecture of project
kind/discussion
Topical discussion; usually not changes to codebase
needs definition
means this story needs further data before it can be estimated
P2
Medium: Good to have, but can wait until someone steps up
This is a meta "for discussion" issue
The basic architecture of the response manager is as follows: all public methods put messages into a buffered channel that is processed sequentially by an internal thread. (see https://github.com/ipfs/go-graphsync/blob/main/docs/architecture.md#actor-pattern-in-requestmanager-and-responsemanager for explanation)
We've now seen a couple deadlocks that the message queue gets filled, then the internal thread blocks because processing the next message somehow triggers sending a new message to the queue and gets blocked by that. Usually, it's a pretty indirect connection involving intermediate mutexes.
Simultaneously, we've seen race conditions that have arose from trying to do things outside this thread to keep things unblocked.
I think it's time to reexamine whether this is the best architecture.
My suggestion is we move from a global queue to per request queues -- so each ongoing request has its own message queue with its own go-routine processing messages.
The ResponseManager would then just manage a mutex locked list of requests that the go-routines would communicate with when they needed to modify the list (i.e. to close a request).
This would solve a number of problems and make code easier to understand -- currently, the state of a request is spread out across the response manager where it's tracked and the internals of the query executor where it actually runs.
The request go-routine would still have to communicate with the peertaskqueue to block its execution on the SImultaneousIncomingRequestLimit, but overall I think it would make things much simpler to reason about.
We still need to address buffering (see issue to address it globally -- #249), and honestly, I think we should just make the request message queues unbuffered.
To do:
The text was updated successfully, but these errors were encountered: