-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deal with memory limitations on the device #65
Comments
Today we briefly discussed not producing the secondary and depositing its energy directly, maybe counting how often this happened. A more elaborate scheme could be to avoid this situation before launching a kernel that could run into this problem, by making sure that there is enough space such that all processes could produce their maximum number of secondaries. For example, if all processes produce at most one secondary (assuming we use the same buffers for electrons and gammas; to be checked), it's sufficient if the buffer has at least twice the capacity compared to the currently used slots. If we have separate buffers for each particle type, we have to check the available slots in the right buffer. To keep going as long as possible, we might schedule processes first that do not produce secondaries or even lead to particles being killed. Another option might be to prioritize particles with lower energy once a certain threshold of the buffer(s) is / are used, at the expense of reducing the amount of parallel work (so we need to make sure that the buffers are large enough and the threshold is such that it still allows decent efficiency). [ brain dump off ] |
About preempting the space needed per process, I think this is a good approach. A possible way to proceed is to partition the available tracks storage into smaller (32K/64K tracks?) blocks, both for the input of processes and for the secondary tracks. For input, this is needed in case we discover that splitting the work in multiple streams increases occupancy, for output, we can give the current block if the remaining space is deemed enough, or a fresh new block if not. We could even provide 2 output blocks for an input block of gammas to store the e+ and the e- from the pair production process. In any case, we likely will need some |
Still relevant, or even more relevant now that we're not reusing memory slots. However, I think it doesn't make sense to work on this until we know that GPU simulation is faster than Geant4 and beneficial - any management overhead can only decrease performance. |
GPUs have limited global memory (compared to current host systems, which can also be extended with swap) and it's best to avoid dynamic memory allocations while executing a region. For that reason, we allocate buffers of a fixed capacity upfront and reuse them as much as possible during simulation. This can lead to the situation that there isn't enough space to produce a secondary particle. The current approach is to terminate the simulation in such cases (see #64), but it would be better to handle this more gracefully.
The text was updated successfully, but these errors were encountered: