Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Added numa_support rfc #1535

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

vossmjp
Copy link
Contributor

@vossmjp vossmjp commented Oct 23, 2024

Description

Adds RFC for simplified NUMA support

Fixes # - issue number(s) if exists

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

List users with @ to send notifications

Other information

Comment on lines +28 to +31
Below is the example that demonstrates the use of these APIs to pin threads to different
arenas to each of the NUMA nodes available on a system, submit work across those `task_arena`
objects and into associated `task_group`` objects, and then wait for work again using both
the `task_arena` and `task_group` objects.
Copy link
Contributor

@akukanov akukanov Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code can be made simpler with std::thread per NUMA domain, instead of relying only on TBB. On the one hand, it also signals that TBB lacks high-level NUMA APIs. On the other hand, TBB, and task arenas specifically, were designed to work well with application level threads where it makes sense. I think it is much better to assume/suggest each NUMA aware arena to be used by a special application thread than to add extra levels of complication with task groups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern of task_arenas and task_groups is what we show in our documentation: for example here. And, probably as a consequence, a pattern we see in applications that use NUMA constraints.

Copy link
Contributor

@akukanov akukanov Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the documentation shows a suboptimal pattern then. In particular, it does not explicitly set the number of reserved slots to 0, and essentially can lead to undersubscription. Why repeating the same mistake one more time? :)

@vossmjp vossmjp changed the title Added numa_support rfc [RFC] Added numa_support rfc Oct 30, 2024
Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few typos to fix.

Access (NUMA) systems, we believe this support can be simplified and improved to provide
an improved user experience.

This early proposal recommends addressing for areas for improvement:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo:

Suggested change
This early proposal recommends addressing for areas for improvement:
This early proposal recommends addressing four areas for improvement:

default does not pin threads to NUMA nodes. It is too easy to write code similar to the preceding
example and be unaware that a HWLOC installation error (or lack of HWLOC) has undone all your effort.

**Getting good performance using these tools requres notable manual coding effort by users.** As we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo:

Suggested change
**Getting good performance using these tools requres notable manual coding effort by users.** As we
**Getting good performance using these tools requires notable manual coding effort by users.** As we

can see in the preceding example, if we want to spread work across the NUMA nodes in
a system we need to query the topology using functions in the `tbb::info` namespace, create
one `task_arena` per NUMA node, along with one `task_group` per NUMA node, and then add an
extra loop that iterates overs these `task_arena` and `task_group` objects to execute the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo:

Suggested change
extra loop that iterates overs these `task_arena` and `task_group` objects to execute the
extra loop that iterates over these `task_arena` and `task_group` objects to execute the

APIs (or behaviors, such as first-touch) to allocator or place them on the appropriate NUMA nodes.

**The out-of-the-box performance of the generic TBB APIs on NUMA systems is not good enough.**
Should the oneTBB library do anything special be default if the system is a NUMA system? Or should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo:

Suggested change
Should the oneTBB library do anything special be default if the system is a NUMA system? Or should
Should the oneTBB library do anything special by default if the system is a NUMA system? Or should

through user questions, can lead to unexpected performance from NUMA optimizations. When running
on a NUMA system, a developer that has not fully read the documentation may expect that `numa_nodes()`
will give a proper accounting of the NUMA nodes. When the code, without raising any alarm, returns only
a single, valid element due to the environmental configuation (such as lack of HWLOCK), it is too easy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo:

Suggested change
a single, valid element due to the environmental configuation (such as lack of HWLOCK), it is too easy
a single, valid element due to the environmental configuation (such as lack of HWLOC), it is too easy

@@ -0,0 +1,179 @@
# Simplified NUMA support in oneTBB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably call it "Improved NUMA support".
Correspondingly, the RFC folder could be numa_support_improvements, meaning that NUMA support is a core feature and improvements are the gist of the proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand the whole numa_support_improvements or simplified_numa_support directory will be moved to rfcs/supported directory once these improvements are accepted. There may be another set of NUMA improvements in the future which could result in another numa_support_improvements directory be created in the same place. And then, when this new set is again accepted, it moves to the same directory. I see potential naming clash issue... It is not related to naming of this directory, but to the naming approach in general. Surely, we could use numa_support_improvements2 as the name of the new directory, but I believe we can do better from the very beginning.

I propose having the directory with the name related to the feature itself, e.g., numa_support, without additionals such as simplified or improvement. This way we will convey the idea that the documents inside directly affect the support of a particular feature. For resolving naming clashes I propose having the file to be named as precise as possible to what the proposal changes avoiding general terms/adjectives such as improved, increased, etc.. For example, for sub-RFC that I wrote, I suggest naming the file to something like introduce_tbbbind_static_library or introduce_tbbbind_statically_linked_with_hwloc; for NUMA-aware allocators name something like introduce_numa-aware_allocator; for task_group dependencies name something like introduce_dependencies_for_tasks_in_task_group; and so on. This way we would avoid name clashing and still this allows grouping similar rfcs together into dedicated folder such as numa_support. Otherwise, I am afraid that the feature is not elaborated enough to be proposed since it sounds too generic in our mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants