-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/CUDA_IPC: Use active-queues to track outstanding work #9654
base: master
Are you sure you want to change the base?
UCT/CUDA_IPC: Use active-queues to track outstanding work #9654
Conversation
/azp run UCX PR |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 4 pipeline(s). |
@@ -63,6 +63,7 @@ int uct_cuda_ipc_ep_is_connected(const uct_ep_h tl_ep, | |||
return ep->remote_pid == *(pid_t*)params->iface_addr; | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
event_q = &q_desc->event_queue; | ||
stream = &q_desc->stream; | ||
cuda_ipc_event = ucs_mpool_get(&iface->event_desc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
event_q = &q_desc->event_queue; | |
stream = &q_desc->stream; | |
cuda_ipc_event = ucs_mpool_get(&iface->event_desc); | |
event_q = &q_desc->event_queue; | |
stream = &q_desc->stream; | |
cuda_ipc_event = ucs_mpool_get(&iface->event_desc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ucs_status_t status; | ||
CUdeviceptr dst, src; | ||
CUstream stream; | ||
CUstream *stream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we use just stream
, not a pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue in using pointer to stream here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #10538
@@ -16,23 +17,18 @@ | |||
#include "cuda_ipc_cache.h" | |||
|
|||
|
|||
#define UCT_CUDA_IPC_MAX_PEERS 16 | |||
KHASH_MAP_INIT_INT(cuda_ipc_queue_desc, uct_cuda_queue_desc_t*); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KHASH_MAP_INIT_INT(cuda_ipc_queue_desc, uct_cuda_queue_desc_t*); | |
KHASH_MAP_INIT_INT(cuda_ipc_queue_desc, uct_cuda_queue_desc_t); |
then you do not need to malloc/free hash values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will consider making this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #10538
static ucs_status_t | ||
uct_cuda_ipc_iface_flush(uct_iface_h tl_iface, unsigned flags, | ||
uct_completion_t *comp) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove excess line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
uct_cuda_ipc_iface_t *iface = ucs_derived_of(tl_iface, uct_cuda_ipc_iface_t); | ||
unsigned max_events = iface->config.max_poll; | ||
unsigned count = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uct_cuda_ipc_iface_t *iface = ucs_derived_of(tl_iface, uct_cuda_ipc_iface_t); | |
unsigned max_events = iface->config.max_poll; | |
unsigned count = 0; | |
uct_cuda_ipc_iface_t *iface = ucs_derived_of(tl_iface, uct_cuda_ipc_iface_t); | |
unsigned max_events = iface->config.max_poll; | |
unsigned count = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
return count; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
ucs_status_t status; | ||
CUstream *stream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use stream
? (not a pointer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't see an issue in using pointer to stream here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #10538
ucs_queue_for_each_extract(cuda_event, queue_head, queue, | ||
cuEventQuery(cuda_event->event) == | ||
CUDA_SUCCESS) { | ||
ucs_queue_remove(queue_head, &cuda_event->queue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need to remove? (extract is used)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want the queue to hold only active events. When there are no events in the queue, we can ignore that event queue and minimize polls.
For some reason I cannot push changes to this PR, so I created a new one: #10538 |
What/Why ?
Currently CUDA_IPC transport uses integer stream_count to track outstanding work but in preparation for multi-device support, this PR moves to active_queue usage similar to cuda_copy transport. This will eventually also help unify more common code shared between cuda_ipc and cuda_copy when it comes to stream/event usage. This PR also removes max peer limitations.