Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiPI fixes #29

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

TiPI fixes #29

wants to merge 12 commits into from

Conversation

scheunemann
Copy link
Contributor

@scheunemann scheunemann commented Nov 8, 2018

I think the update order of the a_buffer is off. For testing, I added ode_robots/simulations/humanoid-TiPI as from here and plugged in the pimax from selforg/controller.

Always trigger learn() (minor)

I think it is beneficial that step() always enters learn() for ensuring that buffers like, e.g.,L_buffer get filled properly. Otherwise the data contained is not correct when swapping between eps[C|A] == 0 and eps[C|A] > 0 during runtime (I am actually doing this in my experiments). Done here.

Filling a_buffer

I think after reading an adding sensor values to the buffer s_buffer[t], learning should be triggered (learn()) and only then $a_t$ should be stored in the respective buffer. I added an assert here which fails in the original version.

Considering the original equation before learn(), e.g.,Matrix a = (C*(s_smooth) + h).map(g); with time indices, the value computed there was a_t = tanh(C_{t-1} * s_t + h_{t-1}) putting it after learn() before increasing $t$ makes it $a_t = tanh(C_t * s_t + h_t)$ and $a_{t-1} = tanh(C_{t-1} * s_{t-1} + h_{t-1})$ for the next step. The assert won't fail.

Caclulating the sum (A20/28)

For $\tau>2$ I think there are two issues related to the the sum (A20) or (28) in the special one-layer NN case. Firstly, the sum is meant to calculate $\Delta C$ but with adding $C_{t-1}$ what gets calculated is $C_t$. This $C_t$ then gets passed on to the calculation for, e.g., (t-2). I changed that in (694ab54) and left a comment.

$L(t-l)$ is calculated with $\frac{\partial \phi(s_{t-l}, a_{t-1})}{\partial s_{t-l}}$ and thus dependend on $C_{t-l}$. That is also how $\delta u$, which is dependent on $L^{(l-1)}(t-l)$, is calculated.
I adapted the sum so $\gamma$ being dependend on $C_{t-l}$.

it's a bit tough to express everything in markdown with using these TeX-links. If you are interested I am happy to provide a PDF or have a chat over that. Also, I can make a PR with only the controller-changes if you prefer so.

scheunemann and others added 12 commits August 15, 2018 15:27
get updates from original master
…n fact $a_{t-1} = K(s_t)$ and thus doesn't equal $a_{t-1} = C_{t-1} * s_{t-1} + h_{t-1}$
…then will be wrong for learning, hence always enter `learn()`
… iteration seems to compute the weight update with adding $C_{t-1}$ to the change within the sum. Results are the same as before for $\tau=2$.
…arameters C at time (t-l). $\partial \psi(s_{t-l}) / \partial s_{t-l}$ is depended on $a_{t-l}$ and therefore on $C_{t-l}$.

Again, nothing changes for $\tau=2$.
@@ -316,11 +321,14 @@ void PiMax::learn(){

const Matrix& metric = useMetric ? gs.map(one_over).map(sqr) : gs.mapP(1, constant);

C += ((( dmu * (ds[l]^T) - (epsrel & al) * (sl^T)) & metric) * epsCN
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line actually calculates the weights at time t rather then the delta. These total weights at time t then get passed for calculating the term for the next step, e.g, t-2.

@scheunemann scheunemann changed the title Minor TiPI fixes TiPI fixes Nov 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants