Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't keep box muller transform state between kernel launches #649

Open
wants to merge 6 commits into
base: hip
Choose a base branch
from

Conversation

neworderofjamie
Copy link
Contributor

@neworderofjamie neworderofjamie commented Jan 13, 2025

This builds on the new hybrid HIP/CUDA backend from #647 so review that first!

As described in #648, beyond its (arguably excessive) 192 bit of state, the curand XORWOW RNG used to provide randomness for neuron and custom connectivity updates also stores 160 bits of box muller transform state in the curandState struct (BM draws two numbers and produces two normally distributed values so this state is used to cache one of those results for subsequent calls to curand_normal).

In this PR, when using CUDA and HIP backends, we create our own XORWowStateInternal struct in definitions.h without the BM state and store this in memory. At the start of the neuron and custom connectivity update kernels, we copy the fields from the XORWowStateInternal struct into a local curandState and, at the end, we copy them back.

Excitingly, because we are very memory bandwidth bound, this makes the neuron kernel on the cortical microcircuit model about 60% faster. On my A5000 (running for 1 second):

Before[seconds] After [seconds]
neuron update 0.18 0.11
presynaptic update 0.23 0.24

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 98.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 88.97%. Comparing base (54101ca) to head (beb19d4).

Files with missing lines Patch % Lines
include/genn/genn/code_generator/environment.h 96.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              hip     #649      +/-   ##
==========================================
+ Coverage   88.91%   88.97%   +0.05%     
==========================================
  Files         108      108              
  Lines       14717    14768      +51     
==========================================
+ Hits        13086    13140      +54     
+ Misses       1631     1628       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@neworderofjamie neworderofjamie force-pushed the xorwow_mem_reduce branch 3 times, most recently from e61dc4b to 1294128 Compare January 15, 2025 12:55
…ase`` entries as well as initialisers for cleanup code
…IP backend

* Write struct to definitions
* Use new class for allocating memory and struct fields
* Reimplemented population RNG preamble and postamble in ``BackendSIMT::getPopulationRNG`` using new destructor mechanism to copy from and to internal struct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant