Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fabric: Add fi_hmem_attr to fi_info #10400

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jiaxiyan
Copy link
Contributor

FI_HMEM is only an on/off capability bit but there are more specific capabilities. Add fi_hmem_attr to fi_info to provide better control over:

  • whether device-specific API calls are permitted in libfabric, except for FI_HMEM initialization.
  • whether peer to peer transfers should be used.
  • whether optimized memcpy for device memory should be used.

FI_HMEM is only an on/off capability bit but there are more
specific capabilities. Add fi_hmem_attr to fi_info to provide
better control over:
- whether device API calls are permitted in libfabric, except for
FI_HMEM initialization.
- whether peer to peer transfers should be used.
- whether optimized memcpy for device memory should be used.

Signed-off-by: Jessie Yang <[email protected]>
Duplicate and free the fi_hmem_attr data if present.

Signed-off-by: Jessie Yang <[email protected]>
Print fi_hmem_attr when passing fi_into to fi_tostr.

Signed-off-by: Jessie Yang <[email protected]>
Check FI_HMEM is set in caps when using fi_hmem_attr, and user do
not request attributes against what prov can offer.
Add alter fi_hmem_attr.

Signed-off-by: Jessie Yang <[email protected]>
Update man pages for fi_hmem_attr.

Signed-off-by: Jessie Yang <[email protected]>
@@ -303,6 +311,7 @@ struct fi_info_1_7 {
struct fi_domain_attr_1_7 *domain_attr;
struct fi_fabric_attr_1_7 *fabric_attr;
struct fid_nic_1_7 *nic;
struct fi_hmem_attr_1_7 *hmem_attr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to this file need to be dropped. These are definitions for the prior ABI structures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

enum fi_hmem_attr_opt use_p2p;
enum fi_hmem_attr_opt use_dev_reg_copy;
struct fi_hmem_attr *next;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are more along the lines of a configuration than an attribute. The app is wanting to configure the provider behavior here, versus discovering provider capabilities.

I want to avoid an embedded linked list. It adds complexity, and it's unlikely that a system will realistically have more than 1 type of hmem installed anyway.

My first thought is this should somehow be linked to memory registration, since that's ultimately where the provider is making decisions on how to perform a data transfer. And the above configuration settings are basically letting the provider know what sort of optimizations it can perform when a data buffer is located in hmem.

Maybe there should be a more involved set of APIs to query and configure memory registration related functionality. I can see the MR cache being part of this.

Copy link
Contributor

@shijin-aws shijin-aws Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to avoid an embedded linked list. It adds complexity, and it's unlikely that a system will realistically have more than 1 type of hmem installed anyway.

We were debating internally on that. If we are confident there wouldn't be > 1 hmem type, we can get rid of the linked list structure. We start from such implementation for flexibility of multiple hmem iface, and we are open to feedback on that.

Copy link
Contributor Author

@jiaxiyan jiaxiyan Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer to add it to fi_info for the following reasons:

  1. the user can see if these configurations are being used by running fi_info.
  2. The api_permitted field is not limited to memory registration. It is also used to prevent the application and libfabric from touching the same resources and negatively impacting the performance.
  3. Adding use_p2p to fi_info can help filter out SHM provider early and allow applications to specify preference for P2P mode early.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These settings are configurations (restrictions) specified by the application, not the provider. They differ from the fi_info attributes in that regard.

@@ -465,6 +481,14 @@ struct fi_fabric_attr {
uint32_t api_version;
};

struct fi_hmem_attr {
enum fi_hmem_iface iface;
enum fi_hmem_attr_opt api_permitted;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is acting as a bool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields will be both input and output. Since a boolean can represent either false or an unspecified value, FI_HMEM_ATTR_UNSPEC is added to differentiate them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either the provider can call the GPU API or not. There's not a third state here.

These settings are dictating to the provider how it must implement data transfers to/from GPU buffers. For some providers, it means that the provider cannot support GPU buffers at all. (There is no PCI peer to peer support for TCP or shmem.) There' a significant difference between these variables and other attribute values.

FI_HMEM_SYNAPSEAI,
};

enum fi_hmem_attr_opt {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use the work 'opt' here because there are setopt calls to change some of these values. But these hit me more as configuration settings, than attributes. See below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants