Model training stops after 1 step for Speech to Text Jasper model #539

conqueror7 · 2020-05-17T13:51:56Z

Hi,
I am trying to train speech to text model on my dataset using checkpoint file of Jasper DR 10x5 model as starting point. Jasper model reference link is below- https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html#decoders-ref
I have created my dataset and using the config file from above link, I ran the training code using cmd as-
python run.py --mode=train --config_file=example_configs/speech2text/jasper10x5_LibriSpeech_nvgrad_masks.py --enable_logs --continue_learning
I had made changes in config file for training_params section in dataset_files as per my CSV dataset, num_gpus=1, num_epochs=4, batch_size_per_gpu=32.
The code runs completely but it stops after 1st step only. I am not able to figure out what is triggering sess.should_stop() in train function. Referring to train function present in file open_seq2seq/utils/funcs.py

This is causing incomplete training. I have around 95K files and batch_size is set to 32. It should ideally run for around 2900 steps.
Can you provide the reason for sess stopping after 1st step?

Configuration used-
Python Version: 3.6.10
Tensorflow Version : 1.14.0
OpenSeq2Seq commit ID: 61204b2
Model: Jasper DR 10x5
Model reference link: https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html#decoders-ref
GPU: 1 P100
Cuda Version : V10.0.130

The text was updated successfully, but these errors were encountered:

aayushkubb · 2020-07-31T10:58:17Z

Can you share the log? If you did continue-training and you are stopping at the same step from where you are beginning the training may stop.

The other reason could be in your config. So if you can show the trace I may help you out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model training stops after 1 step for Speech to Text Jasper model #539

Model training stops after 1 step for Speech to Text Jasper model #539

conqueror7 commented May 17, 2020 •

edited

Loading

aayushkubb commented Jul 31, 2020

Model training stops after 1 step for Speech to Text Jasper model #539

Model training stops after 1 step for Speech to Text Jasper model #539

Comments

conqueror7 commented May 17, 2020 • edited Loading

aayushkubb commented Jul 31, 2020

conqueror7 commented May 17, 2020 •

edited

Loading