You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once there is a sufficiently advanced prototype that allows realistic profiling, it might be worth thinking about the device memory layout and memory coalescing:
Right now, BlockData stores an array of structures. When accessing the same field of all tracks, this results in strided memory accesses and the hardware may not be able to do much about it. Instead, data accessed for all tracks simultaneously could be stored in a structure of arrays, if memory bandwidth is an issue for one of the kernels and memory coalescing is measured to improve performance.
The text was updated successfully, but these errors were encountered:
The new container SparseArray is the evolved version, it will allow both stridden access (based on collected indices of tracks alive or other arbitrary criteria) and coalescing (compacting tracks into a new container). Both operations have overheads, to be seen which one is more efficient in which conditions for the simulation workflow. The alternative of coalescing only the needed data into specific SOA is an evolution to be considered.
Once there is a sufficiently advanced prototype that allows realistic profiling, it might be worth thinking about the device memory layout and memory coalescing:
Right now,
BlockData
stores an array of structures. When accessing the same field of all tracks, this results in strided memory accesses and the hardware may not be able to do much about it. Instead, data accessed for all tracks simultaneously could be stored in a structure of arrays, if memory bandwidth is an issue for one of the kernels and memory coalescing is measured to improve performance.The text was updated successfully, but these errors were encountered: