-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keeping the Scheduler's Single Worker Awake #106
Comments
My strong preference is to rely on The original version of the coordinator drove the state machines via an event channel and goroutine, which is very similar to proposal 1: This was removed to align more with the scheduler on the assumption that something else would be driving the scheduler. The scheduler model is complicated by the use of future planned action. This means we have to implement a cron with cancellation rather than just rely on a simple select loop. I think that would look a lot like the simulator code and it would send actions on a channel when the appropriate time is reached. However note that this leads to non-deterministic behaviour: if two channels are ready in a select statement then Go makes a pseudorandom choice between them. Wherever we can we should try to eliminate work that relies on timed behaviour. The query Pool state machine achieves this by treating a timeout as a mechanism to free up capacity. If the pool is under capacity then timeouts won't be used (I copied this idea from rust-libp2p). However it does require the state machine to be polled, but this is low cost when compared with writing an efficient cron scheduler (a later version of the coordinator included a heartbeat: https://github.com/plprobelab/go-kademlia/blob/7dde002254c492179d2e728ec30f57df27f5aab7/coord/coordinator.go#L103-L106 Note that in the rust-libp2p model In reality I think we need more than one "action queue" otherwise it's easy for one component to impact performance of another. Think of making a request to send a message where the response is scheduled once it is received. Another component can fill the action queue with thousands actions in the meantime, delaying the processing of a response, perhaps even exceeding a pre-planned timeout. We should also be careful with fan-out behaviour, for example the result of a find closer nodes action producing 20 new actions that attempt to suggest the 20 new nodes for the routing table. I have an experimental dht/coordinator that separates IO from internally generated actions. Externally injected actions are also separate in this experiment because I think it's important that we can prioritize work and provide backpressure. For example we should prioritize completion of queries that are making progress over starting new queries. The execution model needs to be able to support this. |
A side note on the existing If we were to adopt bare channels rather than wrap them in an interface then we could use queues in select statements and rely on Go's handling of the synchronization. |
@iand I agree with everything you wrote!
👍🏻
That would be awesome!
This sounds great!
Agreed! |
The Scheduler's Single Worker model is meant to enforce a sequential execution, allowing for stable testing and reproducible protocol simulation. In a single thread simulation, the Single Worker is the only one to add and consume elements from the scheduler. When there are no more actions to be run at a given time, the Single Worker advances the scheduler fake clock to the next scheduled action time, and continues from there. Hence, the Single Worker will be able to complete the simulation in a sequential manner without ever sleeping.
However, in a real world scenario, multiple different go routines need to add elements to the scheduler, and only the Single Worker will consume them one-by-one. This means that once the action queue is empty the Single Worker cannot simply sleep until the next scheduled action, because another thread may add a task to the queue, and the Single Worker is expected to handle it ASAP. There are multiple ways to make sure that a Single Worker is Awake and ready to take a task when necessary:
select
, it is then unblocked by the first event happening between (1) next scheduled action from the Scheduler, (2) another action being added to the queue, and writing to the Alarm (Wakeup?) channel or (3) context cancelled.go createSingleWorker()
) to tackle this new task (and possibly the ones that will arrive while the single worker is on duty). If there is already a Single Worker on duty, it will run the newly added task. This solution means that an additional go routine would be required to keep track of scheduled events, and make sure to spawn a Single Worker if required when a scheduled event is due.Kubo
for thego-libp2p-kad-dht
implementation, is responsible for periodically polling the queue and run any actions that would have been added by other go routines. This approach is inspired by rust-libp2p. During a poll, all actions from the event queue are run until the queue is empty. Then, control is returned to the caller (application). Other go routines may add events to the queue, or scheduled events may become overdue, these events will be handled during the next poll. This approach seems unfit forKubo
sinceKubo
doesn't implement a hierarchical state machine architecture.It is possible to implement multiple schedulers / coordinators, one for each of the described behaviors. An application will only use a single behavior, but different applications may have different needs. However, I suggest we focus on implementing only the behavior required by
Kubo
/go-libp2p-kad-dht
for now.@iand @dennis-tra WDYT?
The text was updated successfully, but these errors were encountered: