typerep/yaksa: always pass info hint for yaksa pack/unpack #5005

hzhou · 2021-01-15T16:47:23Z

Pull Request Description

GPU pointer attribute query can be expensive. Since we have already done that in MPICH, we should avoid do it again in yaksa, thus need to pass in the info hint.

This PR depends on a new yaksa feature: pmodels/yaksa#169 and pass in the "nogpu" hint when no device memory is involved in either inbuf or outbuf. It will create a new yaksa hint when GPU memory is involved.

Fixes #5002

Creating yaksa info hint incurs overhead -- malloc, memcpy, strcmp -- so additional optimization may be needed. This is left as TODO.

[skip warnings]

Expected Impact

Author Checklist

hzhou · 2021-01-15T17:30:12Z

The tests will fail until yaksa is updated with pmodels/yaksa#169

EDIT: updated by #5072

hzhou · 2021-01-23T15:27:58Z

test:mpich/ch4/ofi

hzhou · 2021-03-09T21:58:18Z

test:mpich/ch4/most
test:mpich/ch3/most

This is the key to use when MPIR_CVAR_ENABLE_GPU is off or when we know both in_buf and out_buf are host memory.

hzhou · 2021-03-27T03:37:33Z

test:mpich/ch4/most

raffenet

A couple of minor comments.

raffenet · 2021-03-29T14:11:04Z

src/mpid/ch4/shm/src/shm_init.c

-    ret = MPIDI_IPC_mpi_init_hook(rank, size, tag_bits);
-    MPIR_ERR_CHECK(ret);
+    if (MPIR_CVAR_ENABLE_GPU) {
+        ret = MPIDI_IPC_mpi_init_hook(rank, size, tag_bits);


This will disable XPMEM if there is no GPU, which I don't think should happen.

Good point. I should move it to just where the gpu gets initialized.

Fixed. Now it is around MPIDI_GPU_mpi_init_hook.

raffenet · 2021-03-29T14:12:14Z

src/mpl/include/mpl_gpu.h

@@ -29,6 +29,7 @@ typedef enum {
 typedef struct {
    MPL_pointer_type_t type;
    MPL_gpu_device_handle_t device;
+    MPL_gpu_device_attr attr;


My only issue is the name of the struct member. Maybe device_attr or dev_attr?

Sounds good.

An unrelated side note: it appears to me that at least cuda does its internal query cache/optimization already. We need measure the overhead of passing info and without passing info to yaksa pack/unpack. Maybe it not necessary to do this opmization.

Fixed. Renamed to device_attr.

raffenet · 2021-03-29T14:12:38Z

src/include/mpir_gpu.h

+MPL_STATIC_INLINE_PREFIX int MPIR_gpu_register_host(const void *ptr, size_t size)
+{
+    if (ENABLE_GPU) {
+        return MPL_gpu_register_host(ptr, size);
+    }
+    return MPI_SUCCESS;
+}
+
+MPL_STATIC_INLINE_PREFIX int MPIR_gpu_unregister_host(const void *ptr)
+{
+    if (ENABLE_GPU) {
+        return MPL_gpu_unregister_host(ptr);
+    }
+    return MPI_SUCCESS;
+}
+
+MPL_STATIC_INLINE_PREFIX int MPIR_gpu_malloc_host(void **ptr, size_t size)
+{
+    if (ENABLE_GPU) {
+        return MPL_gpu_malloc_host(ptr, size);
+    } else {
+        *ptr = MPL_malloc(size, MPL_MEM_BUFFER);
+        return MPI_SUCCESS;
+    }
+}
+
+MPL_STATIC_INLINE_PREFIX int MPIR_gpu_free_host(void *ptr)
+{
+    if (ENABLE_GPU) {
+        return MPL_gpu_free_host(ptr);
+    } else {
+        MPL_free(ptr);
+        return MPI_SUCCESS;
+    }
+}


To avoid multiple query of device pointer attribute, for example, between mpich and yaksa, we need cache or pass the queried attribute in hash or info hint. We need have the attr inside the MPL_pointer_attr_t to be able to do that.

Since we already queried pointer device attribute, we should always pass the info hint to yaksa to avoid additional pointer query inside yaksa. The code is locally restructured to always get info hints from attributes.

The checking of pointer attributes and datatype now is done in the main code.

When MPIR_CVAR_ENABLE_GPU is off, pass nogpu infohint to yaksa_init to prevent it query devices. This commit depends on yaksa PR pmodels#172.

We need consistently check MPIR_CVAR_ENABLE_GPU variable in order to consistently skip GPU access. Typical GPU driver such as CUDA have huge initialization cost. Thus we have to make sure to skip every access in order to skip the init latency.

hzhou · 2021-03-29T16:29:06Z

test:mpich/ch4/ofi

hzhou force-pushed the 2101_yaksa_hint branch 4 times, most recently from af8f2f7 to 974d679 Compare January 15, 2021 17:29

hzhou force-pushed the 2101_yaksa_hint branch 2 times, most recently from 759be31 to 5f92b68 Compare January 23, 2021 15:24

hzhou mentioned this pull request Jan 23, 2021

testing: enhance gpu testing #5000

Merged

10 tasks

hzhou force-pushed the 2101_yaksa_hint branch from 5f92b68 to 219a91c Compare March 9, 2021 21:57

hzhou force-pushed the 2101_yaksa_hint branch from 219a91c to dad045e Compare March 26, 2021 21:43

typerep/yaksa: add MPIR_yaksa_info_nogpu

631337f

This is the key to use when MPIR_CVAR_ENABLE_GPU is off or when we know both in_buf and out_buf are host memory.

hzhou force-pushed the 2101_yaksa_hint branch from dad045e to c382e5d Compare March 26, 2021 21:48

raffenet reviewed Mar 29, 2021

View reviewed changes

hzhou added 6 commits March 29, 2021 11:26

mpl/gpu: add device attr inside MPL_pointer_attr_t

ad89a55

To avoid multiple query of device pointer attribute, for example, between mpich and yaksa, we need cache or pass the queried attribute in hash or info hint. We need have the attr inside the MPL_pointer_attr_t to be able to do that.

typerep/yaksa: always pass info with yaksa pack/unpack

7f2e85e

Since we already queried pointer device attribute, we should always pass the info hint to yaksa to avoid additional pointer query inside yaksa. The code is locally restructured to always get info hints from attributes.

typerep/yaksa: remove nolonger used fastpath_memcpy

1179c6b

The checking of pointer attributes and datatype now is done in the main code.

typerep/yaksa: pass nogpu infohint to yaksa_init

db35482

When MPIR_CVAR_ENABLE_GPU is off, pass nogpu infohint to yaksa_init to prevent it query devices. This commit depends on yaksa PR pmodels#172.

ipc: skip ipc gpu init if MPIR_CVAR_ENABLE_GPU is disabled

02c3865

hzhou force-pushed the 2101_yaksa_hint branch from c382e5d to 02c3865 Compare March 29, 2021 16:27

raffenet approved these changes Mar 29, 2021

View reviewed changes

hzhou merged commit 303f205 into pmodels:main Mar 29, 2021

hzhou deleted the 2101_yaksa_hint branch March 29, 2021 18:25

hzhou mentioned this pull request Mar 29, 2021

gpu: misc fixes #5176

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

typerep/yaksa: always pass info hint for yaksa pack/unpack #5005

typerep/yaksa: always pass info hint for yaksa pack/unpack #5005

hzhou commented Jan 15, 2021 •

edited

Loading

hzhou commented Jan 15, 2021 •

edited

Loading

hzhou commented Jan 23, 2021

hzhou commented Mar 9, 2021

hzhou commented Mar 27, 2021

raffenet left a comment

raffenet Mar 29, 2021

hzhou Mar 29, 2021

hzhou Mar 29, 2021

raffenet Mar 29, 2021

hzhou Mar 29, 2021

hzhou Mar 29, 2021

raffenet Mar 29, 2021

hzhou commented Mar 29, 2021

typerep/yaksa: always pass info hint for yaksa pack/unpack #5005

typerep/yaksa: always pass info hint for yaksa pack/unpack #5005

Conversation

hzhou commented Jan 15, 2021 • edited Loading

Pull Request Description

Expected Impact

Author Checklist

hzhou commented Jan 15, 2021 • edited Loading

hzhou commented Jan 23, 2021

hzhou commented Mar 9, 2021

hzhou commented Mar 27, 2021

raffenet left a comment

Choose a reason for hiding this comment

raffenet Mar 29, 2021

Choose a reason for hiding this comment

hzhou Mar 29, 2021

Choose a reason for hiding this comment

hzhou Mar 29, 2021

Choose a reason for hiding this comment

raffenet Mar 29, 2021

Choose a reason for hiding this comment

hzhou Mar 29, 2021

Choose a reason for hiding this comment

hzhou Mar 29, 2021

Choose a reason for hiding this comment

raffenet Mar 29, 2021

Choose a reason for hiding this comment

hzhou commented Mar 29, 2021

hzhou commented Jan 15, 2021 •

edited

Loading

hzhou commented Jan 15, 2021 •

edited

Loading