Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/CUDA_IPC: Use active-queues to track outstanding work #9654

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Akshay-Venkatesh
Copy link
Contributor

What/Why ?

Currently CUDA_IPC transport uses integer stream_count to track outstanding work but in preparation for multi-device support, this PR moves to active_queue usage similar to cuda_copy transport. This will eventually also help unify more common code shared between cuda_ipc and cuda_copy when it comes to stream/event usage. This PR also removes max peer limitations.

@yosefe
Copy link
Contributor

yosefe commented Feb 4, 2024

/azp run UCX PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@openucx openucx deleted a comment from azure-pipelines bot Feb 4, 2024
@yosefe
Copy link
Contributor

yosefe commented Feb 7, 2024

/azp run

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@@ -63,6 +63,7 @@ int uct_cuda_ipc_ep_is_connected(const uct_ep_h tl_ep,
return ep->remote_pid == *(pid_t*)params->iface_addr;
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +112 to 114
event_q = &q_desc->event_queue;
stream = &q_desc->stream;
cuda_ipc_event = ucs_mpool_get(&iface->event_desc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
event_q = &q_desc->event_queue;
stream = &q_desc->stream;
cuda_ipc_event = ucs_mpool_get(&iface->event_desc);
event_q = &q_desc->event_queue;
stream = &q_desc->stream;
cuda_ipc_event = ucs_mpool_get(&iface->event_desc);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix indentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ucs_status_t status;
CUdeviceptr dst, src;
CUstream stream;
CUstream *stream;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we use just stream, not a pointer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue in using pointer to stream here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #10538

@@ -16,23 +17,18 @@
#include "cuda_ipc_cache.h"


#define UCT_CUDA_IPC_MAX_PEERS 16
KHASH_MAP_INIT_INT(cuda_ipc_queue_desc, uct_cuda_queue_desc_t*);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
KHASH_MAP_INIT_INT(cuda_ipc_queue_desc, uct_cuda_queue_desc_t*);
KHASH_MAP_INIT_INT(cuda_ipc_queue_desc, uct_cuda_queue_desc_t);

then you do not need to malloc/free hash values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will consider making this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #10538

static ucs_status_t
uct_cuda_ipc_iface_flush(uct_iface_h tl_iface, unsigned flags,
uct_completion_t *comp)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove excess line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment on lines 298 to +300
uct_cuda_ipc_iface_t *iface = ucs_derived_of(tl_iface, uct_cuda_ipc_iface_t);
unsigned max_events = iface->config.max_poll;
unsigned count = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uct_cuda_ipc_iface_t *iface = ucs_derived_of(tl_iface, uct_cuda_ipc_iface_t);
unsigned max_events = iface->config.max_poll;
unsigned count = 0;
uct_cuda_ipc_iface_t *iface = ucs_derived_of(tl_iface, uct_cuda_ipc_iface_t);
unsigned max_events = iface->config.max_poll;
unsigned count = 0;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix indentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


return count;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

ucs_status_t status;
CUstream *stream;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use stream? (not a pointer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't see an issue in using pointer to stream here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #10538

ucs_queue_for_each_extract(cuda_event, queue_head, queue,
cuEventQuery(cuda_event->event) ==
CUDA_SUCCESS) {
ucs_queue_remove(queue_head, &cuda_event->queue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need to remove? (extract is used)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the queue to hold only active events. When there are no events in the queue, we can ignore that event queue and minimize polls.

@iyastreb
Copy link
Contributor

iyastreb commented Mar 7, 2025

For some reason I cannot push changes to this PR, so I created a new one: #10538
I merged it with latest master and addressed PR comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants