You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.
I'm wondering how much support Neural Speed has for NUMA systems. The Advanced Usage page suggests that all tensors should be allocated on the first NUMA node numactl -m 0 -C 0-<physic_cores-1>. Is there any benefit to doing this?
The text was updated successfully, but these errors were encountered:
I previously thought that this binds all memory allocations to the first NUMA node. However, this would increase internode traffic significantly. Additionally, each thread isn't able to fully utilize the memory bandwidth if the topology has different memory affinities for different nodes. Is my understanding correct? Could you kindly add a bit more to why we're not interleaving the memory allocations?
Intel Xeon offen has 2 sockets, -m 0 aimed to bind the memory in first socket.
There are overhead of communcation between 2 sockets, if you want to reduce internode, you can try our TP.
I'm wondering how much support Neural Speed has for NUMA systems. The Advanced Usage page suggests that all tensors should be allocated on the first NUMA node
numactl -m 0 -C 0-<physic_cores-1>
. Is there any benefit to doing this?The text was updated successfully, but these errors were encountered: