Skip to content

hatchjaw/faust-ddsp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

faust-ddsp

DDSP experiments in Faust.

What is DDSP?

Differentiable programming is a technique whereby a program can be differentiated with respect to its inputs, permitting the computation of the sensitivity of the program's outputs to changes in its inputs. Partial derivatives of a program can be found analytically via automatic differentiation and, coupled with an appropriate loss function, used to perform gradient descent. Differentiable programming has consequently become a key tool in solving machine learning problems.

Differentiable digital signal processing (DDSP) is the specific application of differentiable programming to audio tasks. DDSP has emerged as a key component in machine learning approaches to problems such as source separation, timbre transfer, parameter estimation, etc. DDSP is reliant on a programming language with a supporting framework for automatic differentiation.

DDSP in Faust

Trigger warning: some mild-to-moderate calculus will follow

To write automatically differentiable code we need analytic expressions for the derivatives of the primitive operations in our program.

A Differentiable Primitive

Let's consider the example of the addition primitive; in Faust one can write:

process = +;

which yields the block diagram:

So, the output signal, the result of Faust's process, which we'll call $y$, is the sum of two input signals, $u$ and $v$:

$$ y = u + v. $$

Note that the addition primitive doesn't know anything about its arguments, their origin, provenance, etc., it just consumes them and returns their sum. In Faust's algebra, the addition of two signals (and just about everything in Faust is a signal) is well-defined, and that's that. This idea will be important later.

Now, say $y$ is dependent on some variable $x$, and we wish to know how sensitive $y$ is to changes in $x$, then we should differentiate $y$ with respect to $x$:

$$ \frac{dy}{dx} = \frac{d}{dx}\left(u + v\right) = \frac{du}{dx} + \frac{dv}{dx}. $$

It happens that the derivative of an addition is also an addition, except this time an addition of the derivatives of the arguments with respect to the variable of interest.

In Faust, we could express this fact as follows:

process = +,+;

If we did, we'd be describing, in parallel, $y$ and $\frac{dy}{dx}$, which we could write as:

$$ \begin{align*} \langle y, \frac{dy}{dx} \rangle &= \langle u, \frac{du}{dx} \rangle + \langle v, \frac{dv}{dx} \rangle \\ &= \langle u + v, \frac{du}{dx} + \frac{dv}{dx} \rangle. \end{align*} $$

This is a dual number representation, or more accurately, since we're working with Faust, a dual signal representation. Being able to pass around our algorithm and its derivative in parallel, as dual signals, is pretty handy, as we'll see later. Anyway, what we've just defined is a differentiable addition primitive.

But where exactly is the derivative?

Just as the addition primitive has no knowledge of its input signals, nor does its differentiable counterpart. The differentiable primitive promises the following: "give me $u$ and $v$, and $\frac{du}{dx}$ and $\frac{dv}{dx}$ in that order, and I'll give you $y$ and $\frac{dy}{dx}$". So let's do just that. For $u$ we'll use an arbitrary input signal, which we can represent with a wire, _. $x$ is the variable of interest; Faust's analogy to a variable is a slider1; we'll create one and assign it to $v$. $u$ and $x$ have no direct relationship, so $\frac{du}{dx}$ is $0$. $v$ is $x$, so $\frac{dv}{dx}$ is $1$.

x = hslider("x", 0, -1, 1, .1);
u = _;
v = x;
dudx = 0;
dvdx = 1;
process = u,v,dudx,dvdx : +,+;

The first output of this program is the result of an expression describing an input signal with a DC offset $x$ applied; the second output is the derivative of that expression, a constant signal of value $1$. So far so good, but it's not very automatic.

More Differentiable Primitives

We can generalise things a bit by defining a differentiable input2 and a differentiable slider:

diffInput = _,0;
diffSlider = hslider("x", 0, -1, 1, .1),1;

Simply applying the differentiable addition primitive to these new primitives isn't going to work because inputs to the adder won't arrive in the correct order; we can fix this with a bit of routing however:

diffAdd = route(4,4,(1,1),(2,3),(3,2),(4,4)) : +,+;

Now we can write:

process = diffInput,diffSlider : diffAdd;

The outputs of our program are the same as before, but we've computed the derivative automatically — to be precise, we've implemented forward mode automatic differentiation. Now we have the makings of a modular approach to automatic differentiation based on differentiable primitives and dual signals.

Multivariate Problems

The above works fine for a single variable, but what if our program has more than one variable? Consider the following non-differentiable example featuring a gain control and a DC offset:

x1 = hslider("gain", .5, 0, 1, .1);
x2 = hslider("dc", 0, -1, 1, .1);
process = _,x1 : *,x2 : +;

We can write this as:

$$ y = uv + w, \quad v = x_1, \quad w = x_2. $$

$u$ will again be an arbitrary input signal, for which we have no analytic expression.

Now, rather than being a lone ordinary derivative $\frac{dy}{dx}$, the derivative of $y$$y'$ — is a matrix of partial derivatives:

$$ y' = \frac{\partial y}{\partial \mathbf{x}} = \begin{bmatrix}\frac{\partial y}{\partial x_1} \\ \frac{\partial y}{\partial x_2}\end{bmatrix}. $$

Our algorithm takes two parameter inputs, and produces one output signal, so the resulting Jacobian matrix is of dimension $2 \times 1$.

Returning to dual number representation and applying the chain and product rules of differentiation, we have:

$$ \begin{align*} \langle y,y' \rangle &= \langle u,u' \rangle \langle v,v' \rangle + \langle w,w' \rangle \\ &= \langle uv,u'v + v'u \rangle + \langle w,w' \rangle \\ &= \langle uv + w,u'v + v'u + w'\rangle, \end{align*} $$

To implement the above in Faust, let's define some multivariate differentiable primitives:

diffInput(nvars) = _,par(i,nvars,0);

diffSlider(nvars,I,init,lo,hi,step) = hslider("x%I",init,lo,hi,step),par(i,nvars,i==I-1);

diffAdd(nvars) = route(nIN,nOUT,
        (u,1),(v,2), // u + v
        par(i,nvars,
            (u+i+1,dx),(v+i+1,dx+1) // du/dx_i + dv/dx_i
            with {
                dx = 2*i+3; // Start of derivatives wrt ith var
            }
        )
    ) with {
        nIN = 2+2*nvars;
        nOUT = nIN;
        u = 1;
        v = u+nvars+1;
    } : +,par(i, nvars, +);

diffMul(nvars) = route(nIN,nOUT,
        (u,1),(v,2), // u * v
        par(i,nvars,
            (u,dx),(dvdx,dx+1),   // u * dv/dx_i
            (dudx,dx+2),(v,dx+3)  // du/dx_i * v
            with {
                dx = 4*i+3; // Start of derivatives wrt ith var
                dudx = u+i+1;
                dvdx = v+i+1;
            }
        )
    ) with {
        nIN = 2+2*nvars;
        nOUT = 2+4*nvars;
        u = 1;
        v = u+nvars+1;
    } : *,par(i, nvars, *,* : +);

The routing for diffAdd and diffMul is a bit more involved, but the same principle applies as for the univariate differentiable addition primitive. Our dual signal representation now consists, for each primitive, of the undifferentiated primitive, and, in parallel, nvars partial derivatives, each with respect to the $i^\text{th}$ variable of interest. Accordingly, the differentiable slider now needs to know which value of $i$ to take to ensure that the appropriate combination of partial derivatives can be generated.

Armed with the above we can write the differentiable equivalent of our gain+DC example:

NVARS = 2;
x1 = diffSlider(NVARS,1,.5,0,1,.1);
x2 = diffSlider(NVARS,2,0,-1,1,.1);
process = diffInput(NVARS),x1 : diffMul(NVARS),x2 : diffAdd(NVARS);

Estimating Hidden Parameters

Assigning the above algorithm to a variable estimate, we can compare its first output, $y$, with target output, $\hat{y}$, produced by a groundTruth algorithm with hard-coded gain and DC values. We'll use Faust's default sine wave oscillator as input to both algorithms, and, to perform the comparison, we'll use a time-domain L1-norm loss function:

$$ \mathcal{L}(y,\hat{y}) = ||y-\hat{y}|| $$

import("stdfaust.lib"); // For os.osc, si.bus, etc.
process = os.osc(440.) <: groundTruth,estimate : loss,si.bus(NVARS)
with {
    groundTruth = _,.5 : *,-.5 : +;

    NVARS = 2;
    x1 = diffSlider(NVARS,1,1,0,1,.1);
    x2 = diffSlider(NVARS,2,0,-1,1,.1);
    estimate = diffInput(NVARS),x1 : diffMul(NVARS),x2 : diffAdd(NVARS);

    loss = ro.cross(2) : - : abs <: attach(hbargraph("loss",0,2));
};

Running this in the Faust web IDE, we can drag the sliders x1 and x2 around until we minimise the value reported by the loss function, thus discovering the "hidden" parameters of the ground truth.

TODO: loss gif

Gradient Descent

So far we haven't made use of our Faust program's partial derivatives. Our next step is to automate parameter estimation by incorporating these derivatives into a gradient descent algorithm.

Gradients are found as the derivative of the loss function with respect to $\mathbf{x}$ at time $t$. To get $\mathbf{x}_{t+1}$, we scale the gradients by a learning rate, $\alpha$, and subtract the result from $\mathbf{x}_t$. For our L1-norm loss function that looks like this:

$$ \begin{align*} \mathbf{x}_{t+1} &= \mathbf{x}_t - \alpha\frac{\partial\mathcal{L}}{\partial \mathbf{x}_t} \\ &= \mathbf{x}_t - \alpha\frac{\partial y}{\partial \mathbf{x}_t}\frac{y-\hat{y}}{|y-\hat{y}|}. \end{align*} $$

In Faust, we can't programmatically update the value of a slider.3 What we ought to do at this point, to automate the estimation of parameter values, is invert our approach; we'll use sliders for our "hidden" parameters, and define a differentiable variable to represent their "learnable" counterparts:

diffVar(nvars,I,graph) = -~_ <: attach(graph),par(i,nvars,i+1==I);

diffVar handles the subtraction of the scaled gradient, and we can pass it a bargraph to display the current parameter value.

To supply gradients to the learnable parameters the program has to be set up as a rather grand recursion:

import("stdfaust.lib");

process = os.osc(440.) 
    : hgroup("DDSP",(route(1+NVARS,2+NVARS,(1+NVARS,1),(1+NVARS,2),par(i,NVARS,(i+1,i+3))) 
        : vgroup("[0]Parameters",groundTruth,learnable)
        : route(2+NVARS,4+NVARS,(1,1),(2,2),(1,3),(2,4),par(i,NVARS,(i+3,i+5))) 
        : vgroup("[1]Loss & Gradients",loss,gradients)
    )) ~ (!,si.bus(NVARS))
with {
    groundTruth = vgroup("Hidden", 
        _,hslider("[0]gain",.5,0,1,.1) : *,hslider("[1]DC",-.5,-1,1,.1) : +
    );

    NVARS = 2;

    x1 = diffVar(NVARS,1,hbargraph("[0]gain", 0, 1));
    x2 = diffVar(NVARS,2,hbargraph("[1]DC", -1, 1));
    learnable = vgroup("Learned", diffInput(NVARS),x1,_ : diffMul(NVARS),x2 : diffAdd(NVARS));

    loss = ro.cross(2) : - : abs <: attach(hbargraph("[1]loss",0.,2));
    alpha = hslider("[0]Learning rate [scale:log]", 1e-4, 1e-6, 1e-1, 1e-6);
    gradients = (ro.cross(2): -),si.bus(NVARS)
        : route(NVARS+1,2*NVARS+1,(1,1),par(i,NVARS,(1,i*2+3),(i+2,2*i+2)))
        : (abs,1e-10 : max),par(i,NVARS, *)
        : route(NVARS+1,NVARS*2,par(i,NVARS,(1,2*i+2),(i+2,2*i+1)))
        : par(i,NVARS, /,alpha : * <: attach(hbargraph("gradient %i",-1e-2,1e-2)));
};

Running this code in the web IDE, we see the learned gain and DC values leap (more or less eagerly depending on the learning rate) to meet the hidden values.

Note that we actually needn't compute the loss function, unless we wanted to use some low threshold on $\mathcal{L}$ to halt the learning process. Also, we're not producing any true audio output,4 though we could easily route the first signal produced by the learnable algorithm to output by modifying the first route() instance in vgroup("DDSP",...).

The example we've just considered is a pretty basic one, and if the inputs to groundTruth and learnable were out of phase by, say, 25 samples, it would be a lot harder to minimise the loss function. To work around this we might take time-domain loss over windowed chunks of input, or compute phase-invariant loss in the frequency domain.

The diff Library

To include the diff library, use Faust's library expression:

df = library("/path/to/diff.lib");

The library defines a selection of differentiable primitives and helper functions for describing differentiable Faust programs.

diff uses Faust's pattern matching feature where possible.

The Autodiff Environment

To avoid having to pass the number of differentiable parameters to each primitive, differentiable primitives are defined within an environment expression named df.env. Begin by defining parameters with df.vars and then call df.env, passing in the parameters as an argument, e.g.:

df = library("diff.lib")
...
vars = df.vars((x1,x2))
with {
    x1 = -~_ <: attach(hbargraph("x1",0,1));
    x2 = -~_ <: attach(hbargraph("x2",0,1));
};

d = df.env(vars);

Having defined a differentiable environment in this way, primitives can be called as follows, and the appropriate number of partial derivatives will be calculated:

process = d.diff(+);

Additionally, parameters themselves can be accessed with vars.var(n), where n is the parameter index, starting from 1:

df = library("diff.lib");

vars = df.vars((gain))
with {
    gain = -~_ <: attach(hbargraph("gain",0,1));
};

d = df.env(vars);

process = d.input,vars.var(1) : d.diff(*);

The number of parameters can be accessed with vars.N:

...
learnable = d.input,si.bus(vars.N) // A differentiable input, N gradients
...

Differentiable Primitives

For the examples for the primitives that follow, assume the following boilerplate:

df = library("diff.lib");
vars = df.vars((x1,x2)) with { x1 = -~_; x2 = -~_; };
d = df.env(vars);

Number Primitive

diff(x)

$$ x \rightarrow \langle x,x' \rangle = \langle x,0 \rangle $$

  • Input: a constant numerical expression, i.e. a signal of constant value x
  • Output: one dual signal consisting of the constant signal and vars.N partial derivatives, which all equal $0$.
ma = library("maths.lib");
process = d.diff(2*ma.PI);

Identity Function

diff(_)

$$ \langle u,u' \rangle = \langle u,u' \rangle $$

  • Input: one dual signal
  • Output: the unmodified dual signal
process = d.diff(_);

Add Primitive

diff(+)

$$ \langle u,u' \rangle + \langle v,v' \rangle = \langle u+v,u'+v' \rangle $$

  • Input: two dual signals
  • Output: one dual signal consisting of the sum and vars.N partial derivatives
process = d.diff(+);

Subtract Primitive

diff(-)

$$ \langle u,u' \rangle - \langle v,v' \rangle = \langle u-v,u'-v' \rangle $$

  • Input: two dual signals
  • Output: one dual signal consisting of the difference and vars.N partial derivatives
process = d.diff(-);

Multiply Primitive

diff(*)

$$ \langle u,u' \rangle \langle v,v' \rangle = \langle uv,u'v+v'u \rangle $$

  • Input: two dual signals
  • Output: one dual signal consisting of the product and vars.N partial derivatives
process = d.diff(*);

Divide Primitive

diff(/)

$$ \frac{\langle u,u' \rangle}{\langle v,v' \rangle} = \langle \frac{u}{v}, \frac{u'v - v'u}{v^2} \rangle $$

  • Input: two dual signals
  • Output: one dual signal consisting of the quotient and vars.N partial derivatives

NB. To prevent division by zero in the partial derivatives, diff(/) uses whichever is the largest of $v^2$ and $1\times10^{-10}$.

process = d.diff(/);

Power primitive

diff(^)

$$ \langle u,u' \rangle^{\langle v,v' \rangle} = \langle u^v, u^{v-1}(vu' + uv'\ln(u)) \rangle $$

  • Input: two dual signals
  • Output: one dual signal consisting of the first input signal raised to the power of the second, and vars.N partial derivatives.
process = d.diff(^);

int Primitive

diff(int)

$$ \text{int}\left(\langle u, u'\rangle\right) = \langle\text{int}(u), \partial \rangle, \quad \partial = \begin{cases} u', &\sin(\pi u) = 0, u~\text{increasing} \\ -u', &\sin(\pi u) = 0, u~\text{decreasing} \\ 0, &\text{otherwise.} \end{cases} $$

  • Input: one dual signal
  • Output: one dual signal consisting of the integer cast and vars.N partial derivatives

NB. int is a discontinuous function, and its derivative is impulse-like at integer values of $u$, i.e. at $\sin(\pi u) = 0$; impulses are positive for increasing $u$, negative for decreasing.5

process = d.diff(int);

mem Primitive

diff(mem)

$$ \langle u, u'\rangle[n-1] = \langle u[n-1], u'[n-1] \rangle $$

  • Input: one dual signal
  • Output: one dual signal consisting of the delayed signal and vars.N delayed partial derivatives
process = d.diff(mem);

@ Primitive

diff(@)

$$ \langle u, u' \rangle[n-\langle v, v' \rangle] = \langle u[n-v], u'[n-v] - v'(u[n-v])'_n \rangle $$

  • Input: two dual signals
  • Output: one dual signal consisting of the first input signal delayed by the second, and vars.N partial derivatives of the delay expression

NB. the general time-domain expression for the derivative of a delay features a component which is a derivative with respect to (discrete) time: $(u[n-v])'_n$. This component is computed asymmetrically in time, so diff(@) is of limited use for time-variant $v$. It appears to behave well enough for fixed $v$.

process = d.input,d.diff(10) : d.diff(@);

sin Primitive

diff(sin)

$$ \sin(\langle u, u'\rangle) = \langle\sin(u), u'\cos(u)\rangle $$

  • Input: one dual signal
  • Output: one dual signal consisting of the sine of the input and vars.N partial derivatives
process = d.diff(sin);

cos Primitive

diff(cos)

$$ \cos(\langle u, u'\rangle) = \langle\cos(u), -u'\sin(u)\rangle $$

  • Input: one dual signal
  • Output: one dual signal consisting of the cosine of the input and vars.N partial derivatives
process = d.diff(cos);

tan Primitive

diff(tan)

$$ \tan(\langle u, u'\rangle) = \langle\tan(u), \frac{u'}{\cos^2(u)}\rangle $$

  • Input: one dual signal
  • Output: one dual signal consisting of the tangent of the input and vars.N partial derivatives

NB. To prevent division by zero in the partial derivatives, diff(tan,vars.N) uses whichever is the largest of $\cos^2(u)$ and $1\times10^{-10}$.

process = d.diff(tan);

Helper Functions

Input Primitive

input

$$ u \rightarrow \langle u,u' \rangle = \langle u,0 \rangle $$

process = d.input;

Differentiable Recursive Composition

rec(f~g,ngrads)

A utility for supporting the creation of differentiable recursive circuits. Facilitates the passing of gradients into the body of the recursion.

  • Inputs:
    • f: A differentiable expression taking two dual signals as input and producing one dual signal as output.
    • g: A differentiable expression taking one dual signal as input and producing one dual signal as output.
    • ngrads: The number of differentiable variables in g, i.e. the number of gradients to be passed into the body of the recursion.
  • Outputs: One dual signal; the result of the recursion.

E.g. a differentiable 1-pole filter with one parameter, the coefficient of the feedback component:

process = gradient,d.input : df.rec(f~g,1)
with {
    vars = df.vars((a)) with { a = -~_; };
    d = df.env(vars);
    f = d.diff(+);
    g = d.diff(_),vars.var(1) : d.diff(*);
    gradient = _;
};

Differentiable Phasor

phasor(f0)

Differentiable Oscillator

osc(f0)

Differentiable sum iteration

sumall(N)

Backpropagation circuit

backprop(groundTruth, learnable, lossFunction)

NB. this is defined outside of the autodiff environment, e.g.:

df = library("diff.lib");
...
process = df.backprop(groundTruth, learnable, lossFunction);

Loss Functions

L1 time-domain

learnL1(windowSize, learningRate)

L2 time-domain

learnL2(windowSize, learningRate)

Roadmap

  • More loss functions, optimisers, momentum...
  • Automatic parameter normalisation...
  • Frequency-domain loss...
  • Reverse mode autodiff...
  • Batched training data/ground truth...
  • Offline training → weights → real-time inference...

Footnotes

  1. This serves well enough for the example at hand, but in practice — in a machine learning implementation — a learnable parameter is more like a bargraph. We'll get to that later.

  2. An input isn't strictly a Faust primitive. In fact, syntactically, what we're calling an input here is indistinguishable from Faust's identity function, or wire (_), the derivative of which is also a wire. We need a distinct expression, however, for an arbitrary signal — mic input, a soundfile, etc. — we know to be entering our program from outside, as it were, and for which we have, in principle, no analytic description.

  3. Actually, programmatic parameter updates are possible via Widget Modulation, but changes aren't reflected in the UI. In the interests of keeping things intuitive and visually illustrative, we won't use widget modulation here.

  4. We hear the signal produced by the loss function, however; there's plenty of fun to be had (see examples/broken-osc.dsp for example) in sonifying the byproducts of the learning process.

  5. Yes, this is a bit of an abomination, mathematically-speaking.

About

DDSP experiments in Faust

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages