-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot replicate convergence #3
Comments
@ereday Thinking that I may have broken something, I went back to Julia 0.6 and tried training the model with the last commit before the Julia 1.0 transition. I got to 59% at the end of 30 epochs. With the latest commit I got to 64%. Do you remember the specific version / commit where I can use to replicate your results from the README so I can debug what is going on? |
Hi, I checked the Knet version I am using. According to |
I run a couple of experiments by using exactly the same script and code in the repository. (The environment: Julia 0.6.2, Knet v0.9.1). I share a chart below to share the results I obtained. As you said, they’re not same as the one shared in README. However, the model did not get stuck around ~60%. At the end of the training, I obtained accuracy around ~91% on dev set in general. I remembered that I trained this model (and obtained the corresponding learning curve) on the old cluster (somon & kuacctest) meaning that I might have used even older versions of Knet & Autograd. One possible reason might be the change in the dropout usage. Forget gate bias values of the LSTM might also affect the results. As far as I remember, I was setting them to 1.0 manually on the old cluster (by changing the source code of Knet). If one of these is the problem, playing with hyperparameters and the seed might be enough to recover the loss in the performance which is currently I am doing. If I get improvement, I'll post it here too. I don’t think something serious happened since we’re still able to achieve 91% performance. The saved models can be found here . |
Thank you. The problem is just with convergence speed rather than accuracy
then. I will try to replicate with Julia 1.0.
…On Tue, Oct 16, 2018 at 8:09 AM erenay dayanik ***@***.***> wrote:
I run a couple of experiments by using exactly the same script and code in
the repository. (The environment: Julia 0.6.2, Knet v0.9.1). I share a
chart below to share the results I obtained. As you said, they’re not same
as the one shared in README. However, the model did not get stuck around
~60%. At the end of the training, I obtained accuracy around ~91% on dev
set in general. I remembered that I trained this model (and obtained the
corresponding learning curve) on the old cluster (somon & kuacctest)
meaning that I might have used even older versions of Knet & Autograd. One
possible reason might be the change in the dropout usage. Forget gate bias
values of the LSTM might also affect the results. As far as I remember, I
was setting them to 1.0 manually on the old cluster (by changing the source
code of Knet). If one of these is the problem, playing with hyperparameters
and the seed might be enough to recover the loss in the performance which
is currently I am doing. If I get improvement, I'll post it here too. I
don’t think something serious happened since we’re still able to achieve
91% performance. The saved models can be found here
<https://goo.gl/e4Y3WZ> .
[image: validation accuracy chart]
<https://user-images.githubusercontent.com/13196191/47014337-462dec80-d14a-11e8-93f6-d8014bfd48b1.png>
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABvNpguhFA6qBAJZVfCq70cGaetfJ8QTks5ulcxygaJpZM4XX__8>
.
|
I can confirm similar results with Julia 1.0. Here are the results with old values for comparison. It never exceeds 90%. Dropout problem? (I no longer decide when to apply dropout automatically).
|
I was also thinking dropout at first but then I compared at the train set loss values of the model stated in the readme and 3 models I shared above. On the one hand, all of the the 3 models have higher loss values during training which might be the sign of using high amount of dropout but on the other hand, If we decrease dropout rate the models start to overfit even more. Therefore I started to think something else might be the reason for the decrease in validation set performance. Besides my thoughts, I've also tried smaller dropout rates to see it empirically and I didn't get 94% accuracy on val set. Could (as I said above) initialization way of RNN's forget gates might be the reason ? |
@OsmanMutlu when I try to train from scratch I do not seem to get convergence behavior described in README.md, can you try as well?
The text was updated successfully, but these errors were encountered: