Skip to content

Commit

Permalink
rdma: Poll the control cq if no match
Browse files Browse the repository at this point in the history
If a match isn't found for the current send, poll the control cq
to see if the match can be found.  While this extends the current
send() call, it potentially lowers the time until data transfer
starts.

Signed-off-by: Brian Barrett <[email protected]>
Signed-off-by: Raghu Raja <[email protected]>
  • Loading branch information
rajachan committed Aug 26, 2024
1 parent b7a686a commit 6898e99
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions src/nccl_ofi_rdma.c
Original file line number Diff line number Diff line change
Expand Up @@ -4643,6 +4643,7 @@ static int send(nccl_net_ofi_send_comm_t *send_comm, void *data, int size, int t
* props->maxRecvs > 1.
*/

bool polled_cq = false;
bool have_ctrl = false;
uint16_t msg_seq_num = s_comm->next_msg_seq_num;

Expand All @@ -4651,6 +4652,7 @@ static int send(nccl_net_ofi_send_comm_t *send_comm, void *data, int size, int t
nccl_ofi_msgbuff_status_t msg_stat;
nccl_ofi_msgbuff_result_t mb_res;

retry:
/* Retrive entry from message buffer for msg_seq_num index */
mb_res = nccl_ofi_msgbuff_retrieve(s_comm->msgbuff, msg_seq_num, &elem,
&type, &msg_stat);
Expand Down Expand Up @@ -4687,6 +4689,14 @@ static int send(nccl_net_ofi_send_comm_t *send_comm, void *data, int size, int t
goto error;
}

/* look for control messages and then retry the message search
to avoid unnecessary polling / queueing. */
if (OFI_UNLIKELY(!polled_cq && !have_ctrl)) {
ofi_process_cq_rail(ep, &ep->control_rail);
polled_cq = true;
goto retry;
}

/* Determine if this should be sent eagerly. */
bool eager = false;
if ((!have_ctrl && size <= eager_max_size) ||
Expand Down

0 comments on commit 6898e99

Please sign in to comment.