Skip to content

Fix Libfabric MR caching issues #13327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 14, 2025
Merged

Fix Libfabric MR caching issues #13327

merged 4 commits into from
Jul 14, 2025

Conversation

bwbarrett
Copy link
Member

Fix a set of bugs in both the OFI BTL and OFI MTL around caching MRs. The Libfabric EFA provider used to (erroneously) cache explicitly created MRs and will stop doing so in Libfabric 2.2. This caused a performance regression in both the OFI MTL (with HMEM) and BTL (always) over EFA because bad behaviors had snuck in OMPI w.r.t assuming the provider caches MRs. So we stop disabling the BTL rcache for EFA and add an rcache for HMEM MRs for the OFI MTL. The OFI MTL requires the provider not require FI_MR_LOCAL, so we don't need to worry about caching general MRs there.

While I was fixing that, noticed two other issues in the OFI code that I cleaned up. First, we should use the state of FI_MR_HMEM instead of a provider name for avoiding creating HMEM MRs in the OFI MTL. Second, in the case that the OFI BTL is used without the OFI MTL, we were not properly coupling the OMPI memory monitor with the Libfabric memory monitor.

bwbarrett added 4 commits July 9, 2025 04:15
This was an optimization around a bug in the EFA provider.  The EFA
provider shouldn't be caching explicit registrations anyway, so
avoiding the double cache is silly (and breaks when EFA fixes the
explicit registration cache bug).

Signed-off-by: Brian Barrett <[email protected]>
The OFI MTL exports a memory monitor to Libfabric (so that OMPI's
patcher wins), but in cases where OB1 is directly selected, that
code won't run.  So make sure to also configure Libfabric so that
it won't try to use a suboptimial memory monitor in the case that
only the OFI BTL is used.

Signed-off-by: Brian Barrett <[email protected]>
Rather than use the CXI provider name to disable explicit hmem
registration, use the FI_MR_HMEM flag.

Signed-off-by: Brian Barrett <[email protected]>
The OFI MTL was creating a registration for every operation that used
HMEM when FI_MR_HMEM is required.  This is really performance
inefficient, since creating registrations is expensive.  So stick a
rcache in front of the registrations.

Signed-off-by: Brian Barrett <[email protected]>
@bwbarrett
Copy link
Member Author

@hppritcha can you give this a whirl on a CXI system with --mca mtl_base_verbose 100 and make sure you see a line like:

Support for device buffers enabled with implicit registration

@bwbarrett bwbarrett requested a review from sunkuamzn July 9, 2025 19:29
Copy link
Member

@hppritcha hppritcha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR doesn't seem to change behavior of the CXI provider on CUDA systems. There is some problem with one-sided but its present on main as well.

@bwbarrett bwbarrett merged commit 8f3c171 into open-mpi:main Jul 14, 2025
15 checks passed
@bwbarrett bwbarrett deleted the ofi-hmem branch July 14, 2025 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants