Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed precision training? #487

Closed
ghost opened this issue Aug 12, 2020 · 6 comments
Closed

Mixed precision training? #487

ghost opened this issue Aug 12, 2020 · 6 comments
Labels
enhancement New feature or request

Comments

@ghost
Copy link

ghost commented Aug 12, 2020

Pytorch 1.6 has native support for automatic mixed precision (AMP) training: https://pytorch.org/blog/pytorch-1.6-released/

Should we take advantage of this? In particular I think the larger batches would be nice for encoder and synthesizer training.

@ghost
Copy link
Author

ghost commented Aug 13, 2020

Mozilla TTS tried it and encountered a bug with RNNs to be fixed in a future Pytorch release. mozilla/TTS#486 (comment)

@ghost ghost added the enhancement New feature or request label Aug 20, 2020
@ghost
Copy link
Author

ghost commented Oct 27, 2020

Pytorch 1.7 just released, so it is time to try again. If we add AMP, the implementation must be clean while preserving support for lower versions of PT.

@ghost
Copy link
Author

ghost commented Mar 24, 2021

See fatchord/WaveRNN#229 for an example of how to do this.

@ghost
Copy link
Author

ghost commented Apr 1, 2021

Closing due to lack of developer interest at this time.
Please comment and reopen if you would like to work on this.

@ghost ghost closed this as completed Apr 1, 2021
@ghost
Copy link
Author

ghost commented Nov 7, 2021

I made a branch that supports mixed precision training. It is not recommended for use at this time.
https://github.com/blue-fish/Real-Time-Voice-Cloning/tree/487_mixed_precision_training

For me, mixed precision training is much slower than without it, and loss is occasionally nan. I also had to set up my Python environment with Anaconda due to a problem with matrix multiplication. pytorch/pytorch#56747 (comment)

Pytorch AMP enabled (Python 3.9.7 with Anaconda, pytorch==1.10.0):

{| Epoch: 1/8 (20/2564) | Loss: nan | 0.24 steps/s | Step: 0k | }
Average execution time over 10 steps:
  Blocking, waiting for batch (threaded) (10/10):  mean:    0ms   std:    0ms
  Data to cuda (10/10):                            mean:    1ms   std:    0ms
  Forward pass (10/10):                            mean:  956ms   std:  151ms
  Loss (10/10):                                    mean:   14ms   std:    3ms
  Backward pass (10/10):                           mean: 3013ms   std:  456ms
  Parameter update (10/10):                        mean:   71ms   std:    5ms
  Extras (visualizations, saving) (10/10):         mean:    0ms   std:    0ms

Same setup without AMP:

{| Epoch: 1/8 (20/2564) | Loss: 5.778 | 0.76 steps/s | Step: 0k | }
Average execution time over 10 steps:
  Blocking, waiting for batch (threaded) (10/10):  mean:    0ms   std:    0ms
  Data to cuda (10/10):                            mean:    1ms   std:    0ms
  Forward pass (10/10):                            mean:  451ms   std:   52ms
  Loss (10/10):                                    mean:   10ms   std:    2ms
  Backward pass (10/10):                           mean:  753ms   std:   73ms
  Parameter update (10/10):                        mean:   31ms   std:    3ms
  Extras (visualizations, saving) (10/10):         mean:    5ms   std:    0ms

@ghost ghost reopened this Nov 7, 2021
@ghost
Copy link
Author

ghost commented Nov 15, 2021

Dropping this due to poor performance and lack of interest.

@ghost ghost closed this as completed Nov 15, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

0 participants