• Description
• Usage
This is an implementation of a neural net, completely from scratch, in CUDA/C++. A technical report with a more comprehensive overview of this project can be found here.
The code is by no means efficient and is meant as an introduction to CUDA only. Here is an overview of the various classes and functions:
Everything is implemented in both pure C++ (under CPU/
) and CUDA/C++ (under GPU/
). The syntax remains virtually identical, and there are only two points to bear in mind when switching between C++ and CUDA/C++:
- C++ and CUDA/C++ modules end with the suffixes
CPU
andGPU
respectively. - Don't forget to allocate and destroy CUDA arrays via
cudaMallocManaged
andcudaFree
-
linear.h/Linear_SUFFIX
:-
Initialization:
Required arguments:
_bs
(int
): Batch size._n_in
(int
): Number of input features._n_out
(int
): Number of output features.
Optional arguments:
_lr
(float
): Learning rate.
-
forward
: Runs a linear forward pass.Required arguments:
_inp
(float*
): Pointer to the input data._out
(float*
): Pointer for storing the output data.
-
update
: Updates the weights and biases. -
backward
: Performs a backward pass, storing the gradients in_inp
.
-
-
relu.h/ReLU_SUFFIX
:-
Initialization:
Required argument:
_sz_out
(int
): The number of input/output elements.
-
forward
,backward
: LikeLinear_SUFFIX
but for ReLU.
-
-
mse.h/MSE_SUFFIX
:-
Initialization: Like ReLU.
-
forward
: Dummy method for compatibility with the other modules and performing backpropagation; does not actually calculate the loss.Required arguments:
_inp
(float*
): Pointer to the predictions._out
(float*
): Pointer to the target values.
-
_forward
: Calculates the MSE. This method is solely for calculating the loss and cannot be used during backpropagation.Required arguments: Like
MSE_SUFFIX
but_out
must have an extra element for storing the loss. -
backward
: Performs a backward pass, storing the gradients in_inp
.
-
-
sequential.h/Sequential_SUFFIX
:-
Initialization:
Required arguments:
layers
(std::vector<Module*>
): Layers to be chained together.
-
forward
: Cascades the modules inlayers
.Required arguments:
inp
(float*
): Pointer to the input data.out
(float*
): Dummy argument, only for compatibility with the other forward methods and doesn't get used. The output is accesible via the last layer'sout
attribute.
-
update
: Updates every module inlayers
.
-
-
train_SUFFIX
: Trains a network with gradient descent.Required arguments:
seq
(Sequential_SUFFIX
): Sequential module to train.inp
(float*
): Pointer to the input data.targ
(float*
): Pointer to the target data.bs
(int
): Batch size.n_in
(int
): Number of input features.n_epochs
(int
): Number of epochs.
For end-to-end training with speed benchmarks, please run main.cpp
or main.cu
for the CPU and GPU respectively.