Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is is necessary to update log sigma instead of update sigma for weight noise #28

Open
lai-agent-m opened this issue Apr 8, 2020 · 2 comments

Comments

@lai-agent-m
Copy link

I saw that you calculates the gradient for log(sigma)*2/2048.0, I guess that it's for numerical stability but I'm not sure. In my implementation I directly calculate the gradient of the varience since it's directly in the paper, I didn't well test my code so I'm not sure if anything will break.

@lai-agent-m
Copy link
Author

lai-agent-m commented Apr 8, 2020

I see the problem, the sum of square of a vector do not equal the square of sum, so the trick solve the problem or the problem actually still presisits. hope to hear from you and I'll do some math first.

Ok, After I apply the chain rule, I see that this trick didn't solve the sum of square problem. So you experienced underflow and this trick solved it?

@lai-agent-m
Copy link
Author

okay, It's hard to get gradient for each sample on autograd format. I saw that many other paper on HME recognition also mentioned that they use weight noise. Do you think they also use this version of 'weight noise' that is in fact knid of different from original one. I don't feel square of sum is a good approximation for sum of square.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant