TiPI fixes #29

scheunemann · 2018-11-08T14:34:44Z

I think the update order of the a_buffer is off. For testing, I added ode_robots/simulations/humanoid-TiPI as from here and plugged in the pimax from selforg/controller.

Always trigger `learn()` (minor)

I think it is beneficial that step() always enters learn() for ensuring that buffers like, e.g.,L_buffer get filled properly. Otherwise the data contained is not correct when swapping between eps[C|A] == 0 and eps[C|A] > 0 during runtime (I am actually doing this in my experiments). Done here.

Filling `a_buffer`

I think after reading an adding sensor values to the buffer s_buffer[t], learning should be triggered (learn()) and only then $a_t$ should be stored in the respective buffer. I added an assert here which fails in the original version.

Considering the original equation before learn(), e.g.,Matrix a = (C*(s_smooth) + h).map(g); with time indices, the value computed there was $a_t = tanh(C_{t-1} * s_t + h_{t-1})$ putting it after learn() before increasing $t$ makes it and $a_{t-1} = tanh(C_{t-1} * s_{t-1} + h_{t-1})$ for the next step. The assert won't fail.

Caclulating the sum (A20/28)

For $\tau>2$ I think there are two issues related to the the sum (A20) or (28) in the special one-layer NN case. Firstly, the sum is meant to calculate $\Delta C$ but with adding $C_{t-1}$ what gets calculated is . This then gets passed on to the calculation for, e.g., (t-2). I changed that in (694ab54) and left a comment.

is calculated with $\frac{\partial \phi(s_{t-l}, a_{t-1})}{\partial s_{t-l}}$ and thus dependend on $C_{t-l}$ . That is also how $\delta u$ , which is dependent on $L^{(l-1)}(t-l)$ , is calculated.
I adapted the sum so $\gamma$ being dependend on $C_{t-l}$ .

it's a bit tough to express everything in markdown with using these TeX-links. If you are interested I am happy to provide a PDF or have a chat over that. Also, I can make a PR with only the controller-changes if you prefer so.

get updates from original master

…pzig.de/research/supplementary/TiPI2013/` with `pimax.cpp` being replaced by the `lpzrobots` version.

…n fact $a_{t-1} = K(s_t)$ and thus doesn't equal $a_{t-1} = C_{t-1} * s_{t-1} + h_{t-1}$

…then will be wrong for learning, hence always enter `learn()`

… iteration seems to compute the weight update with adding $C_{t-1}$ to the change within the sum. Results are the same as before for $\tau=2$.

…arameters C at time (t-l). $\partial \psi(s_{t-l}) / \partial s_{t-l}$ is depended on $a_{t-l}$ and therefore on $C_{t-l}$. Again, nothing changes for $\tau=2$.

scheunemann · 2018-11-09T16:27:17Z

selforg/controller/pimax.cpp

@@ -316,11 +321,14 @@ void PiMax::learn(){

      const Matrix& metric = useMetric ? gs.map(one_over).map(sqr) : gs.mapP(1, constant);

-      C += ((( dmu * (ds[l]^T) - (epsrel & al) * (sl^T)) & metric) * epsCN


This line actually calculates the weights at time t rather then the delta. These total weights at time t then get passed for calculating the term for the next step, e.g, t-2.

scheunemann and others added 12 commits August 15, 2018 15:27

Merge pull request #1 from georgmartius/master

15ffc18

get updates from original master

Merge branch 'master' of github.com:georgmartius/lpzrobots

d9a3dee

Humanoid from Simulations.zip from http://robot.informatik.uni-lei…

2fddb5a

…pzig.de/research/supplementary/TiPI2013/` with `pimax.cpp` being replaced by the `lpzrobots` version.

example scenarios from paper PLoS2013

7d28cef

Adding assert as sanity check. At that point, the actual a_tm1 is i…

d5e6df7

…n fact $a_{t-1} = K(s_t)$ and thus doesn't equal $a_{t-1} = C_{t-1} * s_{t-1} + h_{t-1}$

computing $a_t$ with updated parameters C,h

9293e65

epsC and epsA can be changed online from 0 to !0, buffer entries …

c2b5110

…then will be wrong for learning, hence always enter `learn()`

have a re-usable function for smoothing s

f806899

include pimax from selforg/controller

b148ed4

Delete generated file

e8c954b

Iteration over the sum (A20) computes $\Delta C$. However, so far the…

694ab54

… iteration seems to compute the weight update with adding $C_{t-1}$ to the change within the sum. Results are the same as before for $\tau=2$.

A for $\delta u$, the Jacobian L(t-l) is calculated depended on the p…

ff627d5

…arameters C at time (t-l). $\partial \psi(s_{t-l}) / \partial s_{t-l}$ is depended on $a_{t-l}$ and therefore on $C_{t-l}$. Again, nothing changes for $\tau=2$.

scheunemann commented Nov 9, 2018

View reviewed changes

scheunemann changed the title ~~Minor TiPI fixes~~ TiPI fixes Nov 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TiPI fixes #29

TiPI fixes #29

scheunemann commented Nov 8, 2018 •

edited

Loading

scheunemann Nov 9, 2018

		@@ -316,11 +321,14 @@ void PiMax::learn(){

		const Matrix& metric = useMetric ? gs.map(one_over).map(sqr) : gs.mapP(1, constant);

		C += ((( dmu * (ds[l]^T) - (epsrel & al) * (sl^T)) & metric) * epsCN

TiPI fixes #29

Are you sure you want to change the base?

TiPI fixes #29

Conversation

scheunemann commented Nov 8, 2018 • edited Loading

Always trigger learn() (minor)

Filling a_buffer

Caclulating the sum (A20/28)

scheunemann Nov 9, 2018

Choose a reason for hiding this comment

scheunemann commented Nov 8, 2018 •

edited

Loading

Always trigger `learn()` (minor)

Filling `a_buffer`