Core: cuda backend improvements #145

GPMueller · 2017-03-10T12:18:17Z

The speed on maxwell and pascal is roughly equal, which should not be the case. It seems that on pascal some quite trivial kernels take far longer than they should.
A cause for this might be that on pascal not all of a shared memory object is pre-fetched when it is accessed. This may cause significant slowdown in incremental accesses, i.e. in such cases arrays should be pre-fetched by calling cudaMemPrefetchAsync - but maybe cudaMemAdvise could suffice.

cu_project_tangential
cu_sum
cu_set_c_a2

The text was updated successfully, but these errors were encountered:

GPMueller · 2017-03-10T13:09:27Z

It seems the cudaMemPrefetchAsync has negative impact on the performance... Strangely it seems to significantly decrease the number of iterations after which the IPS drops.

Another idea is the following: each iteration, cudaMallocManaged is called 10x, which may indicate an unnecessary copy (missing reference? an = where it doesn't belong?).

GPMueller · 2017-03-10T14:38:52Z

It seems the removal of several cudaMallocManaged calls had significant impact: 23cd093

GPMueller · 2017-03-10T16:03:03Z

The removal of systems[0]->UpdateEnergy(); from Method_LLG::Hook_Post_Iteration() improves performance by another factor ~1.7 but it is unclear to me why.

GPMueller · 2017-03-15T11:25:22Z

The answer seems to be that the gradient and energy calculations for Exchange and DMI are all very costly due to the use of atomics.

A new scheme of the Hamiltonian is needed to make it better suitable for such parallelisations.

Related to #101 and #146

GPMueller · 2017-05-23T07:37:40Z

New schemes for the Hamiltonians have been implemented (8569ae5). #222 now tracks the implementation of the cuda versions of these.

GPMueller · 2018-03-07T13:13:32Z

Further performance improvements will, at my level of expertise, be algorithmic (see e.g. #311) and not in terms of better CUDA code. Therefore closing this issue.

GPMueller added core core-cuda enhancement labels Mar 10, 2017

GPMueller closed this as completed Mar 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: cuda backend improvements #145

Core: cuda backend improvements #145

GPMueller commented Mar 10, 2017 •

edited

Loading

GPMueller commented Mar 10, 2017

GPMueller commented Mar 10, 2017

GPMueller commented Mar 10, 2017

GPMueller commented Mar 15, 2017 •

edited

Loading

GPMueller commented May 23, 2017

GPMueller commented Mar 7, 2018

Core: cuda backend improvements #145

Core: cuda backend improvements #145

Comments

GPMueller commented Mar 10, 2017 • edited Loading

GPMueller commented Mar 10, 2017

GPMueller commented Mar 10, 2017

GPMueller commented Mar 10, 2017

GPMueller commented Mar 15, 2017 • edited Loading

GPMueller commented May 23, 2017

GPMueller commented Mar 7, 2018

GPMueller commented Mar 10, 2017 •

edited

Loading

GPMueller commented Mar 15, 2017 •

edited

Loading