forked from RussTedrake/underactuated
-
Notifications
You must be signed in to change notification settings - Fork 0
/
robust.html
262 lines (217 loc) · 13.5 KB
/
robust.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
<!DOCTYPE html>
<html>
<head>
<title>Underactuated Robotics: Robust and
Stochastic Control</title>
<meta name="Underactuated Robotics: Robust and
Stochastic Control" content="text/html; charset=utf-8;" />
<link rel="canonical" href="http://underactuated.mit.edu/robust.html" />
<script src="https://hypothes.is/embed.js" async></script>
<script type="text/javascript" src="htmlbook/book.js"></script>
<script src="htmlbook/mathjax-config.js" defer></script>
<script type="text/javascript" id="MathJax-script" defer
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js">
</script>
<script>window.MathJax || document.write('<script type="text/javascript" src="htmlbook/MathJax/es5/tex-chtml.js" defer><\/script>')</script>
<link rel="stylesheet" href="htmlbook/highlight/styles/default.css">
<script src="htmlbook/highlight/highlight.pack.js"></script> <!-- http://highlightjs.readthedocs.io/en/latest/css-classes-reference.html#language-names-and-aliases -->
<script>hljs.initHighlightingOnLoad();</script>
<link rel="stylesheet" type="text/css" href="htmlbook/book.css" />
</head>
<body onload="loadChapter('underactuated');">
<div data-type="titlepage">
<header>
<h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a></h1>
<p data-type="subtitle">Algorithms for Walking, Running, Swimming, Flying, and Manipulation</p>
<p style="font-size: 18px;"><a href="http://people.csail.mit.edu/russt/">Russ Tedrake</a></p>
<p style="font-size: 14px; text-align: right;">
© Russ Tedrake, 2021<br/>
Last modified <span id="last_modified"></span>.</br>
<script>
var d = new Date(document.lastModified);
document.getElementById("last_modified").innerHTML = d.getFullYear() + "-" + (d.getMonth()+1) + "-" + d.getDate();</script>
<a href="misc.html">How to cite these notes, use annotations, and give feedback.</a><br/>
</p>
</header>
</div>
<p><b>Note:</b> These are working notes used for <a
href="http://underactuated.csail.mit.edu/Spring2021/">a course being taught
at MIT</a>. They will be updated throughout the Spring 2021 semester. <a
href="https://www.youtube.com/channel/UChfUOAhz7ynELF-s_1LPpWg">Lecture videos are available on YouTube</a>.</p>
<table style="width:100%;"><tr style="width:100%">
<td style="width:33%;text-align:left;"><a class="previous_chapter" href=policy_search.html>Previous Chapter</a></td>
<td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
<td style="width:33%;text-align:right;"><a class="next_chapter" href=output_feedback.html>Next Chapter</a></td>
</tr></table>
<!-- EVERYTHING ABOVE THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<chapter style="counter-reset: chapter 13"><h1>Robust and
Stochastic Control</h1>
<p>So far in the notes, we have concerned ourselves with only known,
deterministic systems. In this chapter we will begin to consider uncertainty.
This uncertainy can come in many forms... we may not know the governing
equations (e.g. the coefficient of friction in the joints), our robot may be
<a href="https://youtu.be/cNZPRsrwumQ?t=32">walking on unknown terrain,
subject to unknown disturbances</a>, or even be picking up unknown objects.
There are a number of mathematical frameworks for considering this
uncertainty; for our purposes this chapter will generalizing our thinking to
equations of the form: $$\dot\bx = {\bf f}(\bx, \bu, \bw, t) \qquad \text{or}
\qquad \bx[n+1] = {\bf f}(\bx[n], \bu[n], \bw[n], n),$$ where $\bw$ is a new
<i>random</i> input signal to the equations capturing all of this potential
variability. Although it is certainly possible to work in continuous time, and
treat $\bw(t)$ as a continuous-time random signal (c.f. <a
href="https://en.wikipedia.org/wiki/Wiener_process">Wiener process</a>), it is
notationally simpler to work with $\bw[n]$ as a discrete-time random signal.
For this reason, we'll devote our attention in this chapter to the
discrete-time systems.</p>
<p>In order to simulate equations of this form, or to design controllers
against them, we need to define the random process that generates $\bw[n]$. It
is typical to assume the values $\bw[n]$ are independent and identically
distributed (i.i.d.), meaning that $\bw[i]$ and $\bw[j]$ are uncorrelated when
$i \neq j$. As a result, we typically define our distribution via a
probability density $p_{\bf w}(\bw[n])$. This is not as limiting as it may
sound... if we wish to treat temporally-correlated noise (e.g. "colored
noise") the format of our equations is rich enough to support this by adding
additional state variables; we'll give an example below of a "whitening
filter" for modeling wind gusts. The other source of randomness that we will
now consider in the equations is randomness in the initial conditions; we will
similarly define a probabilty density $p_\bx(\bx[0]).$</p>
<p>This modeling framework is rich enough for us to convey the key ideas; but
it is not quite sufficient for all of the systems I am interested in. In
<drake></drake> we go to additional lengths to support more general cases of
<a
href="https://drake.mit.edu/doxygen_cxx/group__stochastic__systems.html">stochastic
systems</a>. This includes modeling system parameters that are drawn from
random each time the model is initialized, but are fixed over the duration of
a simulation; it is possible but inefficient to model these as additional
state variables that have no dynamics. In other problems, even the
<i>dimension of the state vector</i> may change in different realizations of
the problem! Consider, for instance, the case of a robot manipulating random
numbers of dishes in a sink. I do not know many control formulations that
handle this type of randomness well, and I consider this a top priority to
think more about! (We'll begin to address it in the <a href="output_feedback.html">output feedback chapter</a>.)</p>
<p>Roughly speaking, I will refer to "stochastic control" as the discipline of
synthesizing controllers that govern the probabilistic evolution of the
equations. "Stochastic optimal control" defines a cost function (now a random
variable), and tries to find controllers that optimize some metric such as the
expected cost. When we use the terms "robust control", we are typically
referring to a class of techniques that try to guarantee a worst-case
performance or a worst-case bound on the effect of randomness on the input on
the randomness on the output. Interestingly, for many robust control
formulations we do not attempt to know the precise probability distribution of
$\bx[0]$ and $\bw[n]$, but instead only define the <i>sets</i>
of possible values that they can take. This modeling is powerful, but can
lead to conservative controllers and pessimistic estimates of performance.</p>
<section><h1>Discrete states and actions</h1>
<p> One of the most amazing features of the dynamic programming, additive
cost approach to optimal control is the relative ease with which it extends
to optimizing stochastic systems. </p>
<subsection><h1>Graph search -- stochastic shortest path
problems</h1>
<p>For discrete systems, we can generalize our dynamics on a graph by
adding action-dependent transition probabilities to the edges. This new
dynamical system is known as a Markov Decision Process (MDP), and we write
the dynamics completely in terms of the transition probabilities \[
\Pr(s[n+1] = s' | s[n] = s, a[n] = a). \] For discrete systems, this is
simply a big lookup table. The cost that we incur for any execution of
system is now a random variable, and so we formulate the goal of control
as optimizing the expected cost, e.g. \begin{equation} J^*(s[0]) =
\min_{a[\cdot]} E \left[ \sum_{n=0}^\infty \ell(s[n],a[n]) \right].
\label{eq:stochastic_dp_optimality_cond} \end{equation} Note that there
are many other potential objectives, such as minimizing the worst-case
error, but the expected cost is special because it preserves the dynamic
programming recursion: \[ J^*(s) = \min_a E \left[ \ell(s,a) + J^*(s')\right]
= \min_a \left[ \ell(s,a) + \sum_{s'} \Pr(s'|s,a) J^*(s') \right].\]
Remarkably, if we use these optimality conditions to construct our value
iteration algorithm \[ \hat{J}(s) \Leftarrow \min_a \left[ \ell(s,a) +
\sum_{s'} \Pr(s'|s,a) \hat{J}(s') \right],\] then this algorithm has the
same strong convergence guarantees of its counterpart for deterministic
systems. And it is essentially no more expensive to compute!</p>
</subsection>
<subsection><h1>Stochastic interpretation of deterministic,
continuous-state value iteration</h1>
<p> There is a particularly cute observation to be made here. Let's assume
that we have discrete control inputs and discrete-time dynamics, but a
continuous state space. Recall the fitted value iteration on a mesh
algorithm described above. In fact, the resulting update is exactly the
same as if we treated the system as a discrete state MDP with $\beta_i$
representing the probability of transitioning to state $x_i$! This sheds
some light on the impact of discretization on the solutions --
discretization error here will cause a sort of diffusion corresponding to
the probability of spreading across neighboring nodes.</p>
</subsection>
</section> <!-- end of stochastic optimal control -->
<section><h1>Linear Quadratic Gaussian (LQG)</h1></section>
<section><h1>Linear Exponential-Quadratic Gaussian (LEQG)</h1>
<p><elib>Jacobson73</elib> observed that it is also straight-forward to
minimize the objective: $$J = E\left[ \prod_{n=0}^\infty e^{\bx^T[n] {\bf Q}
\bx[n]} e^{\bu^T[n] {\bf R} \bu[n]} \right] = E\left[ e^{\sum_{n=0}^\infty
\bx^T[n] {\bf Q} \bx[n] + \bu^T[n] {\bf R} \bu[n]} \right],$$ with ${\bf Q}
= {\bf Q}^T \ge {\bf 0}, {\bf R} = {\bf R}^T > 0,$ by observing that the
cost is monotonically related to $\log{J}$, and therefore has the same
minima (this same trick forms the basis for "geometric programming"
<elib>Boyd07</elib>). This is known as the "Linear Exponential-Quadratic
Gaussian" (LEG or LEQG), and for the deterministic version of the problem
(no process nor measurement noise) the solution is identical to the LQR
problem; it adds no new modeling power. But with noise, the LEQG optimal
controllers are different from the LQG controllers; they depend explicitly
on the covariance of the noise.
<todo>introduce the coefficient \sigma here, instead of just throwing it in
from the start. The coefficient $\sigma$ in the objective is referred to as
the "risk-sensitivity parameter" <elib>Whittle90</elib>.
</todo>
</p>
<todo>Insert simplest form of the derivation here, plus an example.</todo>
<p><elib>Whittle90</elib> provides an extensive treatment of this framework;
nearly all of the analysis from LQR/LQG (including Riccati equations,
Hamiltonian formulations, etc) have analogs for their versions with
exponential cost, and he argues that LQG and H-infinity control can
(should?) be understood as special cases of this approach.</p>
</section>
<todo>
Stochastic optimal control. Stochastic HJB. Graph search. LQR + Gaussian
noise. Whitening filters. Stochastic trajectory optimization. iLQG,
iLEQG/iLEG.
Stochastic verification. (Having bounds on expected value does yield bounds on
system state).
Robust control. Common Lyapunov functions. Dissipation inequalities.
Guaranteed-cost control for linear systems (minimax). Loftberg 2003 for
constrained version (w/ disturbance feedback). H-infinity. Elad's regret
formulation? (or save this for adaptive control section?)
LPV. Pendulum example from Briat15 1.4.1 (and the references therein).
</todo>
</chapter>
<!-- EVERYTHING BELOW THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<div id="references"><section><h1>References</h1>
<ol>
<li id=Jacobson73>
<span class="author">D. Jacobson</span>,
<span class="title">"Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games"</span>,
<span class="publisher">IEEE Transactions on Automatic Control</span>, vol. 18, no. 2, pp. 124--131, apr, <span class="year">1973</span>.
</li><br>
<li id=Boyd07>
<span class="author">S. Boyd and S.-J. Kim and L. Vandenberghe and A. Hassibi</span>,
<span class="title">"A Tutorial on Geometric Programming"</span>,
<span class="publisher">Optimization and Engineering</span>, vol. 8, no. 1, pp. 67-127, <span class="year">2007</span>.
</li><br>
<li id=Whittle90>
<span class="author">Peter Whittle</span>,
<span class="title">"Risk-sensitive optimal control"</span>,Wiley New York
, vol. 20, <span class="year">1990</span>.
</li><br>
</ol>
</section><p/>
</div>
<table style="width:100%;"><tr style="width:100%">
<td style="width:33%;text-align:left;"><a class="previous_chapter" href=policy_search.html>Previous Chapter</a></td>
<td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
<td style="width:33%;text-align:right;"><a class="next_chapter" href=output_feedback.html>Next Chapter</a></td>
</tr></table>
<div id="footer">
<hr>
<table style="width:100%;">
<tr><td><a href="https://accessibility.mit.edu/">Accessibility</a></td><td style="text-align:right">© Russ
Tedrake, 2021</td></tr>
</table>
</div>
</body>
</html>