-
Notifications
You must be signed in to change notification settings - Fork 108
TL/UCP: transition to barrier for sync for onesided a2a #1096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Can one of the admins verify this patch? |
34fb46f
to
96449db
Compare
96449db
to
eaa8091
Compare
@wfaderhold21 didn't we say we were also going to change the test to reflect oshmem behavior? |
@wfaderhold21 |
@janjust This is correct. In order to ensure completion of writes to the remote processes, we need to issue a flush. |
Does the flush become a no-op (or just unnecessary) if RC is used? I'm just wondering how the transport changes this requirement (if at all) |
I believe ordering should be maintained if using RC and a flush is not necessarily required as future PUTs, sends, AMOs should be completed after the PUT, but UCP will return with success on a PUT if only the source buffer is ready for reuse. There's no guarantee that the PUT has been completed at the remote target (e.g., buffered copy). |
What
Switch from using pSync array with atomic increment to TL/UCP barrier for synchronization
Why ?
There are multiple reason to switch to this: knomial barrier scales better and has better performance than atomic increment (see below) and, when PR #1070 is merged, this allows usage of this algorithm with memory handles.
Node Bandwidth
Tested on Thor with 32 nodes 1 PPN
Tested on Thor with 32 nodes 32 PPN