Don't keep box muller transform state between kernel launches #649

neworderofjamie · 2025-01-13T10:34:53Z

This builds on the new hybrid HIP/CUDA backend from #647 so review that first!

As described in #648, beyond its (arguably excessive) 192 bit of state, the curand XORWOW RNG used to provide randomness for neuron and custom connectivity updates also stores 160 bits of box muller transform state in the curandState struct (BM draws two numbers and produces two normally distributed values so this state is used to cache one of those results for subsequent calls to curand_normal).

In this PR, when using CUDA and HIP backends, we create our own XORWowStateInternal struct in definitions.h without the BM state and store this in memory. At the start of the neuron and custom connectivity update kernels, we copy the fields from the XORWowStateInternal struct into a local curandState and, at the end, we copy them back.

Excitingly, because we are very memory bandwidth bound, this makes the neuron kernel on the cortical microcircuit model about 60% faster. On my A5000 (running for 1 second):

	Before[seconds]	After [seconds]
neuron update	0.18	0.11
presynaptic update	0.23	0.24

codecov · 2025-01-13T10:42:07Z

Codecov Report

Attention: Patch coverage is 98.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 88.97%. Comparing base (54101ca) to head (beb19d4).

Files with missing lines	Patch %	Lines
include/genn/genn/code_generator/environment.h	96.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              hip     #649      +/-   ##
==========================================
+ Coverage   88.91%   88.97%   +0.05%     
==========================================
  Files         108      108              
  Lines       14717    14768      +51     
==========================================
+ Hits        13086    13140      +54     
+ Misses       1631     1628       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ase`` entries as well as initialisers for cleanup code

…IP backend * Write struct to definitions * Use new class for allocating memory and struct fields * Reimplemented population RNG preamble and postamble in ``BackendSIMT::getPopulationRNG`` using new destructor mechanism to copy from and to internal struct

… generated in correct scope

neworderofjamie added enhancement CUDA backend HIP backend labels Jan 13, 2025

neworderofjamie added this to the GeNN 5.2.0 milestone Jan 13, 2025

neworderofjamie requested a review from tnowotny January 13, 2025 10:37

neworderofjamie force-pushed the xorwow_mem_reduce branch 3 times, most recently from e61dc4b to 1294128 Compare January 15, 2025 12:55

neworderofjamie added 6 commits January 15, 2025 14:57

easier creation of value types without host thing you can sizeof

750bdbb

Added destructors (think about name) to ``EnvironmentExternalDynamicB…

90a90e2

…ase`` entries as well as initialisers for cleanup code

improve name and extend to custom connectivity updates

90cd27e

add extra layer of environments so RNG initialiser and destructor are…

2a657d5

… generated in correct scope

finaliser is much better name than destructor

beb19d4

neworderofjamie force-pushed the xorwow_mem_reduce branch from 1294128 to beb19d4 Compare January 15, 2025 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't keep box muller transform state between kernel launches #649

Don't keep box muller transform state between kernel launches #649

neworderofjamie commented Jan 13, 2025 •

edited

Loading

codecov bot commented Jan 13, 2025 •

edited

Loading

Don't keep box muller transform state between kernel launches #649

Are you sure you want to change the base?

Don't keep box muller transform state between kernel launches #649

Conversation

neworderofjamie commented Jan 13, 2025 • edited Loading

codecov bot commented Jan 13, 2025 • edited Loading

Codecov Report

neworderofjamie commented Jan 13, 2025 •

edited

Loading

codecov bot commented Jan 13, 2025 •

edited

Loading