-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/CUDA: Use sys_dev to allocate CUDA memory #10553
base: master
Are you sure you want to change the base?
Conversation
return UCS_OK; | ||
} | ||
|
||
status = UCT_CUDADRV_FUNC(cuDevicePrimaryCtxRetain(&primary_ctx, device), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to check the primary device context using cuDevicePrimaryCtxGetState
. The main use case that should be covered when user allocates memory on the device I
, then switch to device J
, and we should allocate staging buffer on device I
. Thus, if cuDevicePrimaryCtxGetState
returns inactive state for the given device, then we should report an error instead of retaining and activating primary context on the device without active primary context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added function, and log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rakhmets why is it considered an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assume this code will be modified to use common func for context retainig, so no need to review it now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the two function above and using uct_cuda_primary_ctx_retain()
src/uct/cuda/base/cuda_md.c
Outdated
@@ -61,6 +61,34 @@ void uct_cuda_base_get_sys_dev(CUdevice cuda_device, | |||
*sys_dev_p = UCS_SYS_DEVICE_ID_UNKNOWN; | |||
} | |||
|
|||
static CUdevice sys_dev_to_device[UCS_SYS_DEVICE_ID_MAX] = { | |||
[0 ... UCS_SYS_DEVICE_ID_MAX - 1] = CU_DEVICE_INVALID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it gcc extension, but not a C99?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
class test_switch_cuda_device : public test_md { | ||
public: | ||
protected: | ||
int num_devices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor:
int num_devices; | |
int m_num_devices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
}; | ||
|
||
template<class T> | ||
void test_switch_cuda_device::detect_mem_type(ucs_memory_type_t mem_type) const | ||
{ | ||
int num_devices; | ||
ASSERT_EQ(cudaGetDeviceCount(&num_devices), cudaSuccess); | ||
int current_device; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to move
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is now done as part of init(), unless I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i meant no need to move initialization of int current_device;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/uct/cuda/base/cuda_md.c
Outdated
@@ -90,6 +118,8 @@ uct_cuda_base_query_md_resources(uct_component_t *component, | |||
continue; | |||
} | |||
|
|||
sys_dev_to_device[sys_dev] = cuda_device; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we populate this array in uct_cuda_base_get_sys_dev
instead?
it is not expected that user will query md tl resources for every cudaDevice visible for the thread/process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user does not need to query all devices, but indeed, does need to query at least one device to trigger this cache population.
Moved to uct_cuda_base_get_sys_dev()
for now but since it adds if(){} check, maybe we could move it to ctor function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to have it here then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/uct/api/uct.h
Outdated
@@ -1712,6 +1712,7 @@ typedef struct uct_allocated_memory { | |||
ucs_memory_type_t mem_type; /**< type of allocated memory */ | |||
uct_md_h md; /**< if method==MD: MD used to allocate the memory */ | |||
uct_mem_h memh; /**< if method==MD: MD memory handle */ | |||
ucs_sys_device_t sys_dev; /**< System device for allocated memory */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, seems like we use sys_device
more often in the APIs, maybe better to use it here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/uct/cuda/base/cuda_md.c
Outdated
@@ -90,6 +118,8 @@ uct_cuda_base_query_md_resources(uct_component_t *component, | |||
continue; | |||
} | |||
|
|||
sys_dev_to_device[sys_dev] = cuda_device; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to have it here then
}; | ||
|
||
template<class T> | ||
void test_switch_cuda_device::detect_mem_type(ucs_memory_type_t mem_type) const | ||
{ | ||
int num_devices; | ||
ASSERT_EQ(cudaGetDeviceCount(&num_devices), cudaSuccess); | ||
int current_device; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i meant no need to move initialization of int current_device;
Repro failures with container |
return UCS_OK; | ||
} | ||
|
||
status = UCT_CUDADRV_FUNC(cuDevicePrimaryCtxRetain(&primary_ctx, device), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rakhmets why is it considered an error?
all comments addressed, skip inactive disabled for now to see if any failure. |
src/uct/base/uct_mem.c
Outdated
if ((mem_type != UCS_MEMORY_TYPE_HOST) || | ||
(sys_dev != UCS_SYS_DEVICE_ID_UNKNOWN)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add helper inline function, smth like
uct_mem_is_host_alloc(params)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
status = UCT_CUDADRV_FUNC(cuDevicePrimaryCtxGetState(device, &flags, | ||
&active), | ||
log_level); | ||
if (status != UCS_OK) { | ||
return status; | ||
} else if (!active) { | ||
} else if (!active && md->config.skip_inactive_primary_ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to rebase on Raul's PR for VMM detection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
happening because of retaining a non-active apparently, will investigate. |
src/ucs/sys/topo/base/topo.c
Outdated
return UCS_OK; | ||
} | ||
|
||
ucs_status_t ucs_topo_sys_dev_set_user_value(ucs_sys_device_t sys_dev, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
src/ucs/sys/topo/base/topo.c
Outdated
size_t length = ucs_static_array_size(ucs_topo_global_ctx.devices); | ||
ucs_topo_sys_device_info_t *device; | ||
|
||
ucs_carray_for_each(device, ucs_topo_global_ctx.devices, length) { | ||
device->user_value = UINTPTR_MAX; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why needed? we do not pre-initialize other fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
src/ucs/sys/topo/base/topo.h
Outdated
* Find system device by pci bus id. | ||
* | ||
* @param [in] bus_id pointer to bus id of the device of interest. | ||
* @param [in] user_value user_value to add along |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @param [in] user_value user_value to add along | |
* @param [in] user_value user_value to add along. |
for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
return (mem_type == UCS_MEMORY_TYPE_HOST) && | ||
(sys_dev == UCS_SYS_DEVICE_ID_UNKNOWN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need it? I'd just ignore sys_dev if mem_type is host
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems it's not strictly needed. @yosefe, i assume this is for future proof?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, IMO better keep it for strict behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is strict behavior? With this change we do not treat host memory as host anymore if sys_device is set. Then how should we handle such case? Imo, better to ignore sys_dev when memory type is host
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if sys_device is set, we cannot do a host memory allocation because it would not match the system device requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we can return invalid param then. What will be the behavior with this change?
src/uct/cuda/base/cuda_md.c
Outdated
@@ -74,7 +94,7 @@ uct_cuda_base_query_md_resources(uct_component_t *component, | |||
int i, num_gpus; | |||
|
|||
status = UCT_CUDADRV_FUNC(cuDeviceGetCount(&num_gpus), UCS_LOG_LEVEL_DIAG); | |||
if ((status != UCS_OK) || (num_gpus == 0)) { | |||
if (status != UCS_OK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need to remove check for num_gpus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed thanks!
return (mem_type == UCS_MEMORY_TYPE_HOST) && | ||
(sys_dev == UCS_SYS_DEVICE_ID_UNKNOWN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, IMO better keep it for strict behavior
"Do not use CUDA primary context if not active when allocating memory.\n", | ||
ucs_offsetof(uct_cuda_copy_md_config_t, skip_inactive_primary_ctx), | ||
{"RETAIN_INACTIVE_PRIMARY_CTX", "y", | ||
"Also use inactive CUDA primary context for memory allocation\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enable retaining and using an inactive CUDA primary context for memory allocation
remote trailing \n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
|
||
return UCS_OK; | ||
if (*cu_device != CU_DEVICE_INVALID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment before:
/* Use the active cuda device */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added line before if(){}
for (auto device = 0; device < m_num_devices; ++device) { | ||
ASSERT_EQ(cudaSetDevice(device), cudaSuccess); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you pls add a comment to explain it?
addressed comments and merged with common retain API, will push here when #10545 is in. |
@@ -79,7 +79,8 @@ static ucs_config_field_t uct_cuda_copy_md_config_table[] = { | |||
UCS_CONFIG_TYPE_ENUM(ucs_memory_type_names)}, | |||
|
|||
{"RETAIN_INACTIVE_PRIMARY_CTX", "y", | |||
"Also use inactive CUDA primary context for memory allocation\n", | |||
"Enable retaining and using an inactive CUDA primary context for memory" | |||
" allocation\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove trailing \n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (to review merge fix)
@@ -78,6 +78,12 @@ static ucs_config_field_t uct_cuda_copy_md_config_table[] = { | |||
ucs_offsetof(uct_cuda_copy_md_config_t, cuda_async_mem_type), | |||
UCS_CONFIG_TYPE_ENUM(ucs_memory_type_names)}, | |||
|
|||
{"RETAIN_INACTIVE_PRIMARY_CTX", "n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set to no to see CI pass for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename this variable to
RETAIN_INACTIVE_PRIMARY_CTX
retain_primary_ctx
to make code more compact
@rakhmets WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest renaming to retain_primary_ctx
src/ucs/sys/topo/base/topo.h
Outdated
* | ||
* @return UCS_OK or error in case device cannot be found. | ||
*/ | ||
ucs_status_t ucs_topo_find_device_by_bus_id_user_value( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe introduce void ucs_topo_sys_dev_set_user_value(ucs_sys_device_t sys_dev, uintptr_t user_value)
, and set it to UINTPTR_MAX
in ucs_topo_find_device_by_bus_id
.
Instead of having two similar methods ucs_topo_find_device_by_bus_id
and ucs_topo_find_device_by_bus_id_user_value
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started as you suggested and changed to address code reviews. @yosefe what do we do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
|
||
*device = user_value; | ||
if (*device == CU_DEVICE_INVALID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an unexpected case.
user_value is set only if all previous methods related to the device have been successfully completed.
Maybe assert is better for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
"Enable retaining and using an inactive CUDA primary context for memory" | ||
" allocation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enable
seems to be redundant.
Retaining and use an inactive CUDA primary context for memory allocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so: "Retain and use an inactive CUDA primary context for memory allocation"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -405,7 +522,9 @@ uct_cuda_copy_mem_alloc(uct_md_h uct_md, size_t *length_p, void **address_p, | |||
*memh_p = alloc_handle; | |||
*address_p = (void*)alloc_handle->ptr; | |||
*length_p = alloc_handle->length; | |||
return UCS_OK; | |||
out: | |||
uct_cuda_copy_pop_ctx(cu_device, alloc_cu_device, UCS_LOG_LEVEL_ERROR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WARN should be ok here. Since UCX can continue running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
status = UCT_CUDADRV_FUNC(cuCtxPopCurrent(&ctx), log_level); | ||
if (status != UCS_OK) { | ||
return; | ||
} | ||
|
||
status = UCT_CUDADRV_FUNC(cuDevicePrimaryCtxRelease(device), log_level); | ||
if (status != UCS_OK) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status = UCT_CUDADRV_FUNC(cuCtxPopCurrent(&ctx), log_level); | |
if (status != UCS_OK) { | |
return; | |
} | |
status = UCT_CUDADRV_FUNC(cuDevicePrimaryCtxRelease(device), log_level); | |
if (status != UCS_OK) { | |
return; | |
} | |
UCT_CUDADRV_FUNC(cuCtxPopCurrent(NULL), log_level); | |
UCT_CUDADRV_FUNC(cuDevicePrimaryCtxRelease(device), log_level); |
No need to exit if pop failed. We may warn user about the error, and it is better to try release the primary context on the device anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
ucs_status_t status; | ||
int dev, num_devices; | ||
|
||
status = UCT_CUDADRV_FUNC(cuCtxGetDevice(cu_device_p), UCS_LOG_LEVEL_DEBUG); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status = UCT_CUDADRV_FUNC(cuCtxGetDevice(cu_device_p), UCS_LOG_LEVEL_DEBUG); | |
status = UCT_CUDADRV_FUNC_LOG_DEBUG(cuCtxGetDevice(cu_device_p)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
status = UCT_CUDADRV_FUNC(cuDeviceGetCount(&num_devices), | ||
UCS_LOG_LEVEL_DIAG); | ||
if (status != UCS_OK) { | ||
return UCS_ERR_INVALID_PARAM; | ||
} | ||
|
||
/* Use the first active cuda device for allocation */ | ||
for (dev = 0; dev < num_devices; dev++) { | ||
if (UCT_CUDADRV_FUNC_LOG_DEBUG(cuDeviceGet(alloc_cu_device_p, dev)) != | ||
UCS_OK) { | ||
continue; | ||
} | ||
|
||
status = uct_cuda_copy_push_ctx(md, *alloc_cu_device_p, log_level); | ||
if (status == UCS_OK) { | ||
break; | ||
} | ||
} | ||
|
||
if (status != UCS_OK) { | ||
ucs_log(log_level, | ||
"no active cuda primary context for memory allocation"); | ||
} | ||
|
||
return status; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this combination of methods from #10564 work for you?
static ucs_status_t
uct_cuda_primary_ctx_push_first_active(CUdevice *cuda_device_p,
CUcontext *cuda_ctx_p)
{
int num_devices, device_num;
ucs_status_t status;
CUdevice device;
CUcontext ctx;
status = UCT_CUDADRV_FUNC_LOG_ERR(cuDeviceGetCount(&num_devices));
if (status != UCS_OK) {
return status;
}
for (device_num = 0; device_num < num_devices; ++device_num) {
status = UCT_CUDADRV_FUNC_LOG_ERR(cuDeviceGet(&device, device_num));
if (status != UCS_OK) {
return status;
}
status = uct_cuda_primary_ctx_retain(device, 0, &ctx);
if (status == UCS_OK) {
break;
} else if (status == UCS_ERR_NO_DEVICE) {
continue;
} else {
return status;
}
}
if (device_num == num_devices) {
return UCS_ERR_NO_DEVICE;
}
status = UCT_CUDADRV_FUNC_LOG_ERR(cuCtxPushCurrent(ctx));
if (status != UCS_OK) {
UCT_CUDADRV_FUNC_LOG_WARN(cuDevicePrimaryCtxRelease(device));
return status;
}
*cuda_device_p = device;
*cuda_ctx_p = ctx;
return UCS_OK;
}
static UCS_F_ALWAYS_INLINE void
uct_cuda_primary_ctx_pop_and_release(CUdevice cuda_device)
{
if (ucs_likely(cuda_device == CU_DEVICE_INVALID)) {
return;
}
UCT_CUDADRV_FUNC_LOG_WARN(cuCtxPopCurrent(NULL));
UCT_CUDADRV_FUNC_LOG_WARN(cuDevicePrimaryCtxRelease(cuda_device));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes agree, maybe we can merge this one and #10564 could extract the block? I understand this one might block other PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep both independent and after merge we unite the code
return (mem_type == UCS_MEMORY_TYPE_HOST) && | ||
(sys_dev == UCS_SYS_DEVICE_ID_UNKNOWN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if sys_device is set, we cannot do a host memory allocation because it would not match the system device requirement.
"Enable retaining and using an inactive CUDA primary context for memory" | ||
" allocation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so: "Retain and use an inactive CUDA primary context for memory allocation"
UCS_TEST_P(test_mem_alloc_device, different_device_cuda_fabric, | ||
"CUDA_COPY_ENABLE_FABRIC=y") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add CUDA_COPY_ENABLE_FABRIC=y as a test variant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did not find existing infra in uct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see
ucx/test/gtest/uct/ib/test_dc.cc
Line 519 in eb1c7f3
enum_resources(const std::string &tl_name) |
(void)UCT_CUDADRV_FUNC(cuCtxPopCurrent(&ctx), UCS_LOG_LEVEL_WARN); | ||
(void)UCT_CUDADRV_FUNC(cuDevicePrimaryCtxRelease(alloc_cu_device), | ||
UCS_LOG_LEVEL_WARN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it was better when these 2 calls were in a helper function uct_cuda_copy_pop_ctx because it is the opposite of uct_cuda_copy_push_ctx
then reuse uct_cuda_copy_pop_ctx also in uct_cuda_copy_md_sync_memops_get_address_range for consistency
if (uct_cuda_copy_push_ctx(cuda_device, 0, UCS_LOG_LEVEL_ERROR) != | ||
UCS_OK) { | ||
return; | ||
} | ||
} else { | ||
cuda_mem_ctx = cuda_ctx; | ||
} | ||
|
||
if (UCT_CUDADRV_FUNC_LOG_ERR(cuCtxPushCurrent(cuda_mem_ctx)) != UCS_OK) { | ||
goto out_primary_ctx_release; | ||
} else if (UCT_CUDADRV_FUNC_LOG_ERR(cuCtxPushCurrent(cuda_ctx)) != | ||
UCS_OK) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (cuda_ctx == NULL) {
status = uct_cuda_copy_push_ctx(cuda_device, 0, UCS_LOG_LEVEL_ERROR);
} else {
status = UCT_CUDADRV_FUNC_LOG_ERR(cuCtxPushCurrent(cuda_ctx));
}
if (status !=UCS_OK) {
return;
}
UCS_TEST_P(test_mem_alloc_device, different_device_cuda_fabric, | ||
"CUDA_COPY_ENABLE_FABRIC=y") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see
ucx/test/gtest/uct/ib/test_dc.cc
Line 519 in eb1c7f3
enum_resources(const std::string &tl_name) |
@@ -78,6 +78,12 @@ static ucs_config_field_t uct_cuda_copy_md_config_table[] = { | |||
ucs_offsetof(uct_cuda_copy_md_config_t, cuda_async_mem_type), | |||
UCS_CONFIG_TYPE_ENUM(ucs_memory_type_names)}, | |||
|
|||
{"RETAIN_INACTIVE_PRIMARY_CTX", "n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest renaming to retain_primary_ctx
{ | ||
CUcontext ctx; | ||
|
||
(void)UCT_CUDADRV_FUNC(cuCtxPopCurrent(&ctx), UCS_LOG_LEVEL_WARN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor, can pass NULL instead of ctx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Co-authored-by: Thomas Vegas <[email protected]>
What?
Re-adds existing API PR #10448 with latest comments addressed, and UCT implementation.
Why?
Fragments used for pipeline need to use the same GPU as the user GPU.
How?
Set context if needed, if not available find first usable primary context. If explicit context was requested and not found, return an error.