Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the best params and how to check if training is ok? #5

Open
nikich340 opened this issue Nov 21, 2021 · 1 comment
Open

What are the best params and how to check if training is ok? #5

nikich340 opened this issue Nov 21, 2021 · 1 comment

Comments

@nikich340
Copy link

nikich340 commented Nov 21, 2021

I started training model by your short guide, prepared ~15 hours of russian audios from 1 to 10 seconds in format 16bit/22050 Hz, generated training and validation list.
Usually it is recommended to trim silence from audios, but hparams from your models have trim_silence: False option, why?
Transcription texts were prepared by your NLP-handler with edited stress dictionary (to wrap some missed words).

Now I am training from scratch using params: epochs: 100, iters_per_checkpoint: 500, fp16_run: true, warm_start: false (makes sense when I continue training from checkpoint), also I set lr_scheduler_options as they were in ruslan hparams.yaml (there was some error with default params from this repo) and batch_size: 5 (my GPU can't handle more, but I hope to access better machine in feature). Other params are as they were originally in current state of this repo.

So it seems that learning is running without problems, but I wanted to ask what exactly params did you use to train ruslan/natasha models? What changes are recommended during training process (learning_rate?) and what should be overall loss, Grad Norm and other losses for good results as in your pre-trained models?

Finally, I noticed that my model checkpoints have fixed size of 329.880 KB, while your models are 109.990 KB. Am I doing something wrong?

@Dekakhrone
Copy link
Collaborator

Hello @nikich340! Sorry for the late reply, I'm catastrophically busy due to a new release that is coming soon.

Usually it is recommended to trim silence from audios, but hparams from your models have trim_silence: False option, why?

It's because the trim_top_db parameter may vary from one dataset to another, so the user has to experiment and find a suitable value before switch the trim_silence flag on. Keep in mind, that inner silence will also be cut out.

... what exactly params did you use to train ruslan/natasha models?

You can get good results by using the default paramters from the hparams.yaml. If I remember correctly, the same parameters were used to train sova Ruslan and Natasha's models.

What changes are recommended during training process (learning_rate?)

You can experiment with lr scheduler's settings

what should be overall loss, Grad Norm and other losses for good results as in your pre-trained models?

The value of the total validation loss is about 0.6 - this is ok

Finally, I noticed that my model checkpoints have fixed size of 329.880 KB, while your models are 109.990 KB. Am I doing something wrong?

No, it's because full checkpoint is saved during train process, and this checkpoint consists of the parameters of the model and the optimizer (and some other), take a look here. You can cut values from the checkpoint dict, that are unnecessary for the inference:

import torch

checkpoint_dict = torch.load("path/to/checkpoint", map_location="cpu")
for key in list(checkpoint_dict.keys()):
    if key not in ["state_dict", "hparams"]:
        checkpoint_dict.pop(key)
torch.save(checkpoint_dict, "path/to/reduced_checkpoint")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants