Skip to content
/ scorch Public

Like torch, but rather than seeing the light, you get burnt.

License

Notifications You must be signed in to change notification settings

timstr/scorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scorch

A lightweight, cross-platform, header-only library written in standard C++ for tensor arithmetic with automatic differentiation, designed to closely mimick the famous PyTorch library in usage and appearance but with improved compile-time safety checks.

Here's an example that trains a two-layer neural network to learn the identity function. This code can be run in demo.cpp. Most of this should look and feel immediately familiar to a PyTorch user.

// layer sizes
constexpr std::size_t InputDim = 4;
constexpr std::size_t HiddenDim = 16;
constexpr std::size_t OutputDim = 4;

// learnable network parameters
auto W0 = scorch::rand<float, HiddenDim, InputDim>();
auto b0 = scorch::rand<float, HiddenDim>();
auto W1 = scorch::rand<float, HiddenDim, HiddenDim>();
auto b1 = scorch::rand<float, HiddenDim>();
auto W2 = scorch::rand<float, OutputDim, HiddenDim>();
auto b2 = scorch::rand<float, OutputDim>();

// optimizer
// learning rate, momentum ratio, parameters...
auto opt = scorch::optim::SGD(0.1f, 0.8f, W0, b0, W1, b1);

// batch size
constexpr std::size_t BatchDim = 16;

for (auto i = 0; i < 100; ++i) {
    // random input
    auto x = scorch::rand<float, BatchDim, InputDim>();

    // identity function: output is equal to input
    auto y = copy(x);

    // compute the network output
    // Yes, it's actually this simple
    auto y_hat = sigmoid(sigmoid(x % W0 + b0) % W1 + b1) % W2 + b2;

    // compute the loss
    auto l = mean((y_hat - y) ^ 2.0f);

    // don't forget to zero the gradients before back-propagation
    opt.zero_grad();

    // compute the gradients of all parameters w.r.t. the loss
    l.backward();

    // take a training step
    opt.step();
}

Notable features:

  • Support for vector, matrix, and tensor variables with arbitrarily many dimensions
  • Support for scalar variables
  • Element-wise functions, broadcasting semantics*, matrix-vector mulitplication, and more.
  • The usual overloaded operators, plus % for matrix-vector multiplication and ^ for exponentiation.
  • Extremely ergonomic syntax for writing expressions (see the example)
  • Compile-time checking of tensor shape compatibility (!!!)
  • Automatic differentiation using reverse-mode gradient computation
  • Dynamic computational graphs
  • Optimizers (only SGD for now)
  • Tested with MSVC, GCC, and Clang

* Broadcasting semantics are only supported for pairs of tensors whose shapes are identical except that one may have additional higher dimensions. For example, a size 3x5x7 tensor is broadcastable with a size 5x7 tensor and a size 7 tensor, but a size 3x5x7 tensor is not broadcastable with a size 1x1x7 tensor, or a size 1x1x1 tensor.

Features that are not supported but are probably coming soon:

  • Tensor views, clever indexing, and differentiation through tensor scalar element access
  • Convolutions
  • Matrix-matrix multiplication
  • Some remaining basic mathematical functions (e.g. cbrt, atan, etc...)
  • Smarter optimizers (e.g. Adam, RMSProp, if I can understand them)
  • Higher-order derivatives (maybe)

Features that not supported and probably never will be:

  • GPU acceleration
  • Dynamically-sized tensors

This code was written by Tim Straubinger and is made available for free use under the MIT license.