[XLA:GPU] Support partially pipelined async send recv ops #17446
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[XLA:GPU] Support partially pipelined async send recv ops
This is needed for pipeline parallelism on GPU where the send/recv operations
are issued in one loop iteration and completed in the next. The same buffer
must be alive throughout the process and no copies can be inserted.
Avoid copies for these partially pipelined async send/recv ops. Insert the
required copies and controlflow constraints on the send/recv ops separately.
This is to ensure that the live times of the buffers do not overlap.
Send: For send, a copy is inserted on the operand, starting a new live range.
By enforcing this copy after the corresponding send/done, buffer live times are
disjoint.
Recv: For recv, a copy is inserted after recv-done, ending the live time of the
buffer. Bt enforcing the copy to be before the corresponding recv. buffer live
times are disjoint.