-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Added numa_support rfc #1535
base: master
Are you sure you want to change the base?
Conversation
Below is the example that demonstrates the use of these APIs to pin threads to different | ||
arenas to each of the NUMA nodes available on a system, submit work across those `task_arena` | ||
objects and into associated `task_group`` objects, and then wait for work again using both | ||
the `task_arena` and `task_group` objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code can be made simpler with std::thread
per NUMA domain, instead of relying only on TBB. On the one hand, it also signals that TBB lacks high-level NUMA APIs. On the other hand, TBB, and task arenas specifically, were designed to work well with application level threads where it makes sense. I think it is much better to assume/suggest each NUMA aware arena to be used by a special application thread than to add extra levels of complication with task groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern of task_arenas
and task_groups
is what we show in our documentation: for example here. And, probably as a consequence, a pattern we see in applications that use NUMA constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the documentation shows a suboptimal pattern then. In particular, it does not explicitly set the number of reserved slots to 0, and essentially can lead to undersubscription. Why repeating the same mistake one more time? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few typos to fix.
Access (NUMA) systems, we believe this support can be simplified and improved to provide | ||
an improved user experience. | ||
|
||
This early proposal recommends addressing for areas for improvement: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A typo:
This early proposal recommends addressing for areas for improvement: | |
This early proposal recommends addressing four areas for improvement: |
default does not pin threads to NUMA nodes. It is too easy to write code similar to the preceding | ||
example and be unaware that a HWLOC installation error (or lack of HWLOC) has undone all your effort. | ||
|
||
**Getting good performance using these tools requres notable manual coding effort by users.** As we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A typo:
**Getting good performance using these tools requres notable manual coding effort by users.** As we | |
**Getting good performance using these tools requires notable manual coding effort by users.** As we |
can see in the preceding example, if we want to spread work across the NUMA nodes in | ||
a system we need to query the topology using functions in the `tbb::info` namespace, create | ||
one `task_arena` per NUMA node, along with one `task_group` per NUMA node, and then add an | ||
extra loop that iterates overs these `task_arena` and `task_group` objects to execute the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A typo:
extra loop that iterates overs these `task_arena` and `task_group` objects to execute the | |
extra loop that iterates over these `task_arena` and `task_group` objects to execute the |
APIs (or behaviors, such as first-touch) to allocator or place them on the appropriate NUMA nodes. | ||
|
||
**The out-of-the-box performance of the generic TBB APIs on NUMA systems is not good enough.** | ||
Should the oneTBB library do anything special be default if the system is a NUMA system? Or should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A typo:
Should the oneTBB library do anything special be default if the system is a NUMA system? Or should | |
Should the oneTBB library do anything special by default if the system is a NUMA system? Or should |
through user questions, can lead to unexpected performance from NUMA optimizations. When running | ||
on a NUMA system, a developer that has not fully read the documentation may expect that `numa_nodes()` | ||
will give a proper accounting of the NUMA nodes. When the code, without raising any alarm, returns only | ||
a single, valid element due to the environmental configuation (such as lack of HWLOCK), it is too easy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A typo:
a single, valid element due to the environmental configuation (such as lack of HWLOCK), it is too easy | |
a single, valid element due to the environmental configuation (such as lack of HWLOC), it is too easy |
@@ -0,0 +1,179 @@ | |||
# Simplified NUMA support in oneTBB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably call it "Improved NUMA support".
Correspondingly, the RFC folder could be numa_support_improvements
, meaning that NUMA support is a core feature and improvements are the gist of the proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand the whole numa_support_improvements
or simplified_numa_support
directory will be moved to rfcs/supported
directory once these improvements are accepted. There may be another set of NUMA improvements in the future which could result in another numa_support_improvements
directory be created in the same place. And then, when this new set is again accepted, it moves to the same directory. I see potential naming clash issue... It is not related to naming of this directory, but to the naming approach in general. Surely, we could use numa_support_improvements2
as the name of the new directory, but I believe we can do better from the very beginning.
I propose having the directory with the name related to the feature itself, e.g., numa_support
, without additionals such as simplified
or improvement
. This way we will convey the idea that the documents inside directly affect the support of a particular feature. For resolving naming clashes I propose having the file to be named as precise as possible to what the proposal changes avoiding general terms/adjectives such as improved
, increased
, etc.. For example, for sub-RFC that I wrote, I suggest naming the file to something like introduce_tbbbind_static_library
or introduce_tbbbind_statically_linked_with_hwloc
; for NUMA-aware allocators name something like introduce_numa-aware_allocator
; for task_group dependencies name something like introduce_dependencies_for_tasks_in_task_group
; and so on. This way we would avoid name clashing and still this allows grouping similar rfcs together into dedicated folder such as numa_support
. Otherwise, I am afraid that the feature is not elaborated enough to be proposed since it sounds too generic in our mind.
Description
Adds RFC for simplified NUMA support
Fixes # - issue number(s) if exists
Type of change
Choose one or multiple, leave empty if none of the other choices apply
Add a respective label(s) to PR if you have permissions
Tests
Documentation
Breaks backward compatibility
Notify the following users
List users with
@
to send notificationsOther information