-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodegroup messages are too low a priority #3674
Comments
A nuance of intent: Do we want to think of nodegroup messages as a within-node load balancing mechanism also? If used for that purpose, it should be low priority. "Finish all your local work assigned to you and then pay attention to nodegroup messages" makes sense for this scenario. However we do have task queue for that purpose. And nodegroup messages have another purpose: to quickly process data arriving from outside the node. (Basically saying: any pe, please execute this, it is urgent so I don't want to wait for any particular PE to become available). The common use case appears to be the latter, so its worth fixing. |
I have tested a fix in my copy, which works, but can sometimes slow things down. (I think because checking the nodeq, which is accessed via lock/fence, has a cost.. But also because changing order in which things happen can have unexpected consequences). But mostly it does what we expect. I will work on making a good branch with that fix and push a PR. (The idea is to check nodegroup messages with highest priority once in every K iterations of the scheduler.) |
If you want a nodegroup entry method to be executed urgently upon arrival, isn't that what [immediate] is for? |
Immediate will execute on commthread, which we don't want in this case. (comm thread will become a bottleneck). Also, we do want to parallelize/distribute the (not insignificant) work of handling incoming messages to all PEs... not for each message, but across them. |
I noted this in a tram-replacement experiment, with histogram code from bale suite. Nodegroup messages carrying boatload of data were arriving on the node but not being processed as all PEs were busy cycling through there own locally generated work. (Although the work generation method was broken down using CThYield() every x iterations). @nilsdeppe also seconded this request on slack.
What is happening: scheduler's function for finding next message always processes non-local (i.e. messages from others but meant for my PE) and local (from my pe) messages, so nodeGroup messages, although they can be processed by any PE, get delayed.
The request is to allow nodegroup messages to execute earlier.. somehow.
The text was updated successfully, but these errors were encountered: