-
Notifications
You must be signed in to change notification settings - Fork 18
/
forward.Rmd
187 lines (171 loc) · 5.16 KB
/
forward.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
# Forward Mode
The forward-mode automatic differentiation algorithm, also known as
the tangent method, is a means for efficiently computing derivatives
of smooth functions $f:\mathbb{R} \rightarrow \mathbb{R}^M$
with a single input and multiple outputs.
Suppose $x \in \mathbb{R}$ and that $v = f(u)$. We write a
dot over an expression to indicate a derivative with respect to a
distinguished variable $x$,
$$
\dot{u} = \frac{\partial}{\partial x} u.
$$
The term $\dot{u}$ is called the tangent of $u$ with respect to $x$;
the $x$ is implicit in the notation, but assumed to be the same $x$ in
expressions with multiple dotted expressions.
For example, if $v = -u$, then by the chain rule,
$$
\dot{v}
= \frac{\partial}{\partial x} v
= \frac{\partial}{\partial x} -u
= - \frac{\partial}{\partial x} u
= -\dot{u}.
$$
Similarly, if we have $v = \exp(u)$, then
$$
\dot{v}
= \frac{\partial}{\partial x} v
= \frac{\partial}{\partial x} \exp(u)
= \exp(u) \cdot \frac{\partial}{\partial x} u
= \exp(u) \cdot \dot{u}.
$$
Derivative propagatin in forward mode works the same way for
multivariate functions. For example, if $y = u \cdot v$ is a product,
the usual derivative rule for products applies,
$$
\dot{y}
= \frac{\partial}{\partial x} u \cdot v
= \left( \frac{\partial}{\partial x} u \right) \cdot v
+ u \cdot \left( \frac{\partial}{\partial x} v \right)
= \dot{u} \cdot v + u \cdot \dot{v}.
$$
## Dual numbers
Forward-mode automatic differentiation can be formalized using dual
numbers consisting of the value of an expression and its derivative,
$\langle u, \dot{u} \rangle.$ Smooth functions may then be extended
to operate on dual numbers. For example, negation is defined for dual
numbers by
$$
-\langle u, \dot{u}\rangle = \langle -u, -\dot{u} \rangle.
$$
For sums,
$$
\langle u, \dot{u} \rangle + \langle v, \dot{v} \rangle
= \langle u + v, \dot{u} + \dot{v} \rangle,
$$
and for differences
$$
\langle u, \dot{u} \rangle - \langle v, \dot{v} \rangle
= \langle u - v, \dot{u} - \dot{v} \rangle,
$$
For products,
$$
\langle u, \dot{u} \rangle \cdot \langle v, \dot{v} \rangle
= \langle u \cdot v, \dot{u} \cdot v + u \cdot \dot{v} \rangle
$$
and for quotients,
$$
\langle u, \dot{u} \rangle / \langle v, \dot{v} \rangle
= \langle u / v, \dot{u} / v - u / v^2 \cdot \dot{v} \rangle.
$$
For exponentiation,
$$
\exp\left( \langle u, \dot{u} \rangle \right)
= \langle \exp(u), \exp(u) \cdot \dot{u}\rangle,
$$
and for the logarithm,
$$
\log \langle u, \dot{u} \rangle
=
\langle \log u, \frac{1}{u} \cdot \dot{u} \rangle.
$$
These all translate directly into rules of tangent propagation.
## Gradient-vector products
The product of the gradient of a function at a given point and an
arbitrary vector can be computed efficiently using forward-mode
automatic differentiation. Suppose $f : \mathbb{R}^N \rightarrow
\mathbb{R}$ is a smooth function and $x \in \mathbb{R}^N$ is a point
in its domain. To compute the derivative of $f$ at $x$ along an arbitrary
vector $v \in \mathbb{R}^N$, it suffices to compute the
gradient-vector product $\nabla_v f(x) = \nabla f(x) \cdot v$. This
can be done with forward-mode automatic differentiation by
initializing the tangents of the input variable $x$ with the vector
being multiplied,
$$
\dot{x} = v.
$$
Then dual arithmetic is used as usual to compute the result, and the
final dual number's tangent is the gradient-vector product.
For example, suppose
$$
f(x) = x_1 \cdot x_2 + x_2,
$$
$$
x = \begin{bmatrix} 12.9 & 127.1 \end{bmatrix}^{\top},
$$
and
$$
v = \begin{bmatrix} 0.3 & -1.2 \end{bmatrix}^{\top}.
$$
The gradient-vector product $\nabla_v f(x) = \nabla f(x) \cdot v$
is derived as
$$
\begin{array}{rcl}
\langle 12.9, 0.3 \rangle \cdot \langle 127.1, -1.2 \rangle
+ \langle 127.1, -1.2 \rangle
& = &
\langle 12.9 \cdot 127.1, \, 0.3 \cdot 127.1 + 12.9 \cdot -1.2 \rangle
+ \langle 127.1, -1.2 \rangle
\\[4pt]
& = &
\langle 12.9 \cdot 127.1 + 127.1, \
0.3 \cdot 127.1 + 12.9 \cdot -1.2 + -1.2
\rangle
\\[4pt]
& = &
\langle 1767, 21 \rangle.
\end{array}
$$
Checking the gradient-vector product analytically,
$$
\nabla f(x)
=
\begin{bmatrix} x_2 & x_1 + 1 \end{bmatrix}
=
\begin{bmatrix} 127.1 & 12.9 + 1 \end{bmatrix},
$$
so that
$$
\nabla f(x) \cdot v
= \begin{bmatrix} 127.1 & 13.9 \end{bmatrix}
\cdot \begin{bmatrix} 0.3 & -1.2 \end{bmatrix}^{\top}
= 21.
$$
## Directional derivatives
A directional derivative measures the change in a multivariate
function in a given direction. A direction can be specified as a
point on a sphere, which corresponds to a unit vector $v$, i.e., a
vector of unit length, where $\sum_{n=1}^N v_n^2 = 1.$ Given a smooth
function $f : \mathbb{R}^N \rightarrow \mathbb{R},$ its derivative in
the direction of unit vector $v \in \mathbb{R}^N$ at a point $x \in
\mathbb{R}^N$ is $\nabla f(x) \cdot v.$ That is, a directional
derivative is a gradient-vector product where the vector is a unit
vector.
Partial derivatives can be defined as vector-gradient products,
$$
\frac{\partial}{\partial x_n} f(x)
=
\nabla f(x) \cdot u_n,
$$
by taking $u_n$ to be the unit vector pointing along the $n$-th axis,
$$
u_n =
\Big[
\begin{array}[t]{c}
\
\underbrace{0 \cdots 0 }_{n - 1 \ \textrm{zeros}}
\ \ 1 \ \
\underbrace{0 \cdots 0}_{N - n \ \textrm{zeros}}
\
\end{array}
\Big].
$$