-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial hwloc support #1108
Add initial hwloc support #1108
Conversation
Are the upstream OPAL changes taken from a release or a particular commit of OMPI? Either way, perhaps we should leave a note on this PR in case it ends up being helpful later. |
e4cea62
to
d07d795
Compare
The changes to the oac_* files reflect the current status of the main branch (currently, that is commit The changes to the opal_* files align with the v5.0.1 tag of the upstream repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We can explore other use-cases for hwloc later.
Side note: The UCX row with pmi-mpi looks slow in testing, but I think we can pretty safely ignore that... |
cad5d83
to
762cb6e
Compare
Two quick questions @wrrobin @davidozog:
|
For 1. I would guess we should see the rpath, considering your changes in the PR. Are you using any other flags that might skip the configury code? |
Ok, I've made the suggested changes (ensure rpath for hwloc is set in configure.ac, failed hwloc API calls now send warning message rather than error). |
9ef66c4
to
6935fd5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philipmarshall21 - I ran into a couple minor things but pending those, I think this PR is ready to merge. All the RPATH stuff is working on my end, thanks for patching that.
#endif // HWLOC_ENFORCE_SINGLE_SOCKET || HWLOC_ENFORCE_SINGLE_NUMA_NODE | ||
hwloc_issue_topology: | ||
RAISE_WARN_MSG("Please verify your hwloc installation\n"); | ||
hwloc_exit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a minor thing, but prob worth addressing if possible - we see this warning when building with hwloc but without enforcing single socket or numa node:
../../src/init.c:452:5: warning: label ‘hwloc_exit’ defined but not used [-Wunused-label]
452 | hwloc_exit:
One option might be to remove hwloc_issue_topology
(it's not adding much value anyway), then we can remove hwloc_exit
(and revive it if it's ever needed).
[m4_set_remove([oac_var_scope_active_set], oac_var_scope_var)])dnl | ||
oac_var_scope_pop oac_var_scope_stack | ||
m4_popdef([oac_var_scope_stack])dnl | ||
])dnl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super minor, but looks like we lost a newline relative to upstream on this file.
This PR is a subset of the changes PR #1107 and is intended to isolate the integration of hwloc as an optional dependency to SOS from the addition of multi-NIC functionality
The changes to the oac_* files reflect the current status of the main branch (at the time of this PR, that is commit
c1cfc910d92af43f8c27807a9a84c9c13f4fbc65
) as the upstream repo does not have any tags.The changes to the opal_* files align with the v5.0.1 tag of the upstream repo.