Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphsync Simplifcation Idea: Unblocking the ResponseManager #251

Open
3 tasks
hannahhoward opened this issue Oct 21, 2021 · 0 comments
Open
3 tasks

Graphsync Simplifcation Idea: Unblocking the ResponseManager #251

hannahhoward opened this issue Oct 21, 2021 · 0 comments
Assignees
Labels
effort/days Estimated to take multiple days, but less than a week exp/expert Having worked on the specific codebase is important kind/architecture Core architecture of project kind/discussion Topical discussion; usually not changes to codebase needs definition means this story needs further data before it can be estimated P2 Medium: Good to have, but can wait until someone steps up

Comments

@hannahhoward
Copy link
Collaborator

hannahhoward commented Oct 21, 2021

This is a meta "for discussion" issue

The basic architecture of the response manager is as follows: all public methods put messages into a buffered channel that is processed sequentially by an internal thread. (see https://github.com/ipfs/go-graphsync/blob/main/docs/architecture.md#actor-pattern-in-requestmanager-and-responsemanager for explanation)

We've now seen a couple deadlocks that the message queue gets filled, then the internal thread blocks because processing the next message somehow triggers sending a new message to the queue and gets blocked by that. Usually, it's a pretty indirect connection involving intermediate mutexes.

Simultaneously, we've seen race conditions that have arose from trying to do things outside this thread to keep things unblocked.

I think it's time to reexamine whether this is the best architecture.

My suggestion is we move from a global queue to per request queues -- so each ongoing request has its own message queue with its own go-routine processing messages.

The ResponseManager would then just manage a mutex locked list of requests that the go-routines would communicate with when they needed to modify the list (i.e. to close a request).

This would solve a number of problems and make code easier to understand -- currently, the state of a request is spread out across the response manager where it's tracked and the internals of the query executor where it actually runs.

The request go-routine would still have to communicate with the peertaskqueue to block its execution on the SImultaneousIncomingRequestLimit, but overall I think it would make things much simpler to reason about.

We still need to address buffering (see issue to address it globally -- #249), and honestly, I think we should just make the request message queues unbuffered.

To do:

  • Spike on implementation
  • Discuss and evaluate next steps
  • Propose a final architecture
@hannahhoward hannahhoward added need/triage Needs initial labeling and prioritization kind/discussion Topical discussion; usually not changes to codebase exp/expert Having worked on the specific codebase is important kind/architecture Core architecture of project needs definition means this story needs further data before it can be estimated effort/days Estimated to take multiple days, but less than a week P2 Medium: Good to have, but can wait until someone steps up and removed need/triage Needs initial labeling and prioritization labels Oct 21, 2021
@hannahhoward hannahhoward changed the title Unblocking the ResponseManager Graphsync Simplifcation Idea: Unblocking the ResponseManager Oct 26, 2021
@hannahhoward hannahhoward self-assigned this Dec 2, 2021
@hannahhoward hannahhoward moved this from Icebox to In Progress in Project Thunder (Interop) Dec 2, 2021
@hannahhoward hannahhoward moved this from In Progress to Icebox in Project Thunder (Interop) Mar 8, 2022
marten-seemann pushed a commit that referenced this issue Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/days Estimated to take multiple days, but less than a week exp/expert Having worked on the specific codebase is important kind/architecture Core architecture of project kind/discussion Topical discussion; usually not changes to codebase needs definition means this story needs further data before it can be estimated P2 Medium: Good to have, but can wait until someone steps up
Projects
None yet
Development

No branches or pull requests

1 participant