Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typerep/yaksa: always pass info hint for yaksa pack/unpack #5005

Merged
merged 7 commits into from
Mar 29, 2021

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Jan 15, 2021

Pull Request Description

GPU pointer attribute query can be expensive. Since we have already done that in MPICH, we should avoid do it again in yaksa, thus need to pass in the info hint.

This PR depends on a new yaksa feature: pmodels/yaksa#169 and pass in the "nogpu" hint when no device memory is involved in either inbuf or outbuf. It will create a new yaksa hint when GPU memory is involved.

Fixes #5002

Creating yaksa info hint incurs overhead -- malloc, memcpy, strcmp -- so additional optimization may be needed. This is left as TODO.

[skip warnings]

Expected Impact

Author Checklist

  • Reference appropriate issues (with "Fixes" or "See" as appropriate)
  • Remove xfail from the test suite when fixing a test
  • Commits are self-contained and do not do two things at once
  • Commit message is of the form: module: short description and follows good practice
  • Passes whitespace checkers
  • Passes warning tests
  • Passes all tests
  • Add comments such that someone without knowledge of the code could understand
  • You or your company has a signed contributor's agreement on file with Argonne
  • For non-Argonne authors, request an explicit comment from your companies PR approval manager

@hzhou hzhou force-pushed the 2101_yaksa_hint branch 4 times, most recently from af8f2f7 to 974d679 Compare January 15, 2021 17:29
@hzhou
Copy link
Contributor Author

hzhou commented Jan 15, 2021

The tests will fail until yaksa is updated with pmodels/yaksa#169

EDIT: updated by #5072

@hzhou hzhou force-pushed the 2101_yaksa_hint branch 2 times, most recently from 759be31 to 5f92b68 Compare January 23, 2021 15:24
@hzhou hzhou mentioned this pull request Jan 23, 2021
10 tasks
@hzhou
Copy link
Contributor Author

hzhou commented Jan 23, 2021

test:mpich/ch4/ofi

@hzhou hzhou force-pushed the 2101_yaksa_hint branch from 5f92b68 to 219a91c Compare March 9, 2021 21:57
@hzhou
Copy link
Contributor Author

hzhou commented Mar 9, 2021

test:mpich/ch4/most
test:mpich/ch3/most

@hzhou hzhou force-pushed the 2101_yaksa_hint branch from 219a91c to dad045e Compare March 26, 2021 21:43
This is the key to use when MPIR_CVAR_ENABLE_GPU is off or when we know
both in_buf and out_buf are host memory.
@hzhou hzhou force-pushed the 2101_yaksa_hint branch from dad045e to c382e5d Compare March 26, 2021 21:48
@hzhou
Copy link
Contributor Author

hzhou commented Mar 27, 2021

test:mpich/ch4/most

Copy link
Contributor

@raffenet raffenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments.

ret = MPIDI_IPC_mpi_init_hook(rank, size, tag_bits);
MPIR_ERR_CHECK(ret);
if (MPIR_CVAR_ENABLE_GPU) {
ret = MPIDI_IPC_mpi_init_hook(rank, size, tag_bits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will disable XPMEM if there is no GPU, which I don't think should happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I should move it to just where the gpu gets initialized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Now it is around MPIDI_GPU_mpi_init_hook.

@@ -29,6 +29,7 @@ typedef enum {
typedef struct {
MPL_pointer_type_t type;
MPL_gpu_device_handle_t device;
MPL_gpu_device_attr attr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only issue is the name of the struct member. Maybe device_attr or dev_attr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

An unrelated side note: it appears to me that at least cuda does its internal query cache/optimization already. We need measure the overhead of passing info and without passing info to yaksa pack/unpack. Maybe it not necessary to do this opmization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Renamed to device_attr.

Comment on lines +77 to +111
MPL_STATIC_INLINE_PREFIX int MPIR_gpu_register_host(const void *ptr, size_t size)
{
if (ENABLE_GPU) {
return MPL_gpu_register_host(ptr, size);
}
return MPI_SUCCESS;
}

MPL_STATIC_INLINE_PREFIX int MPIR_gpu_unregister_host(const void *ptr)
{
if (ENABLE_GPU) {
return MPL_gpu_unregister_host(ptr);
}
return MPI_SUCCESS;
}

MPL_STATIC_INLINE_PREFIX int MPIR_gpu_malloc_host(void **ptr, size_t size)
{
if (ENABLE_GPU) {
return MPL_gpu_malloc_host(ptr, size);
} else {
*ptr = MPL_malloc(size, MPL_MEM_BUFFER);
return MPI_SUCCESS;
}
}

MPL_STATIC_INLINE_PREFIX int MPIR_gpu_free_host(void *ptr)
{
if (ENABLE_GPU) {
return MPL_gpu_free_host(ptr);
} else {
MPL_free(ptr);
return MPI_SUCCESS;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

hzhou added 6 commits March 29, 2021 11:26
To avoid multiple query of device pointer attribute, for example,
between mpich and yaksa, we need cache or pass the queried attribute in
hash or info hint. We need have the attr inside the MPL_pointer_attr_t
to be able to do that.
Since we already queried pointer device attribute, we should always pass
the info hint to yaksa to avoid additional pointer query inside yaksa.

The code is locally restructured to always get info hints from
attributes.
The checking of pointer attributes and datatype now is done in the main
code.
When MPIR_CVAR_ENABLE_GPU is off, pass nogpu infohint to yaksa_init to
prevent it query devices. This commit depends on yaksa PR pmodels#172.
We need consistently check MPIR_CVAR_ENABLE_GPU variable in order to
consistently skip GPU access. Typical GPU driver such as CUDA have huge
initialization cost. Thus we have to make sure to skip every access in
order to skip the init latency.
@hzhou hzhou force-pushed the 2101_yaksa_hint branch from c382e5d to 02c3865 Compare March 29, 2021 16:27
@hzhou
Copy link
Contributor Author

hzhou commented Mar 29, 2021

test:mpich/ch4/ofi

@hzhou hzhou merged commit 303f205 into pmodels:main Mar 29, 2021
@hzhou hzhou deleted the 2101_yaksa_hint branch March 29, 2021 18:25
@hzhou hzhou mentioned this pull request Mar 29, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gpu: add info hints to disable gpu buffers
2 participants