Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/efa: failure to enumerate unconditionally when FI_PROV_ATTR_ONLY is present. #10757

Open
aws-nslick opened this issue Feb 2, 2025 · 0 comments
Labels

Comments

@aws-nslick
Copy link
Contributor

Describe the bug

Docs for fi_getinfo state the following, emphasis mine:

FLAGS

FI_PROV_ATTR_ONLY

Indicates that the caller is only querying for what providers are potentially available. All providers will return exactly one fi_info struct, regardless of whether that provider is usable on the current platform or not. The returned fi_info struct will contain default values for all members, with the exception of fabric_attr. The fabric_attr member will have the prov_name and prov_version values filled in.

EFA fails to appear in the results unconditionally.

To Reproduce

Test program:

// g++ -std=c++2a -DFMT_HEADER_ONLY=1 -L/opt/amazon/efa/lib -isystem/opt/amazon/efa/include test.cc -o test -lfabric
#include <cstdint>
#include <fmt/core.h>
#include <fmt/format.h>
#include <rdma/fabric.h>
#include <stdexcept>
#include <string>
#include <unordered_map>

int main() {
  static constexpr auto get = [](std::uint32_t major_version = FI_MAJOR_VERSION,
                                 std::uint32_t minor_version =
                                     FI_MINOR_VERSION) {
    fi_info *provider_list{nullptr};
    int ret = fi_getinfo(FI_VERSION(major_version, minor_version), nullptr,
                         nullptr, FI_PROV_ATTR_ONLY, nullptr, &provider_list);

    if (ret < 0 || !provider_list) {
      throw std::runtime_error("fi_getinfo(FI_PROV_ATTR_ONLY) failed");
    }
    struct version_pair {
      std::uint32_t major_version{};
      std::uint32_t minor_version{};
    };
    std::unordered_map<std::string, version_pair> providers;
    for (auto *current = provider_list; current != nullptr;
         current = current->next)
      providers.emplace(
          current->fabric_attr->prov_name,
          version_pair{FI_MAJOR(current->fabric_attr->prov_version) / 100,
                       FI_MAJOR(current->fabric_attr->prov_version) % 100});

    fi_freeinfo(provider_list);
    return providers;
  };

  for (auto &&[name, version] : get()) {
    auto &[maj, min] = version;
    fmt::print(stdout, "prov/{}: {}.{}\n", name, maj, min);
  }

  return 0;
};

Example output on a host where no EFA devices are present:

$ ./asdftest
prov/ofi_mrail: 1.22
prov/sm2: 1.22
prov/udp: 1.22
prov/ofi_rxm: 1.22
prov/ofi_hook_hmem: 1.22
prov/shm: 1.22
prov/tcp: 1.22
prov/ofi_hook_trace: 1.22
prov/ofi_hook_perf: 1.22
prov/ofi_hook_dmabuf_peer_mem: 1.22
prov/psm3: 3.7
prov/sockets: 1.22
prov/ofi_hook_debug: 1.22
prov/ofi_hook_noop: 1.22
prov/off_coll: 1.22
$ ( ./asdftest | grep efa ) || echo "EFA not in output!"
EFA not in output!

Example output on a host where EFA is available:

$ ssh host-with-efa-available '(./asdftest | grep "prov/efa") || echo "EFA not in output!"'
prov/efa: 1.22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant