-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instability during training #6
Comments
Sounds like gradients explosion. Common recommendations are to do gradient clipping and/or reduce learning rate. Like |
I'm still trying but so far haven't been able to solve it. Forgot to mention I'm also getting a warning once I call fit_generator to this model:
Can you tell if I'm incorrectly using your code in any way? ie. restrictions on masking 0 (padding) on the input, one-hot output only, etc |
Probably The warning you recieve is caused by my somehow inefficient implementation of embeddings in AttentionDecoder, it may slow down the training, but should not arise instability you mentioned. The problem is probably caused by masking, can you remove PS I cannot see the picture you attached in initial comment ("crazy instability issues"), so I don't fully understand how crazy that instability is. |
Sorry for the late reply. While I can remove I should test it out tomorrow. This is the picture I mentioned in the first post. |
I'm fairly new to this and for some reason I'm having during training. I've witnessed over 10% decrease in validation accuracy at some point.
It's a many-to-many problem similar to pos tagging (vocab size much smaller). Input is an array of 40 integers (zero-padded), output is an array of 40 one-hot vectors. Any idea what I'm doing wrong?
The text was updated successfully, but these errors were encountered: