What does mel, lf0,loss mean? #661
-
What does mel, lf0,loss mean? How to interpret this log page? When is the right time to stop training? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
I'm no professional, either, but i've read some of the machine learning papers. Mel, LF0, G and loss are all terms used in the context of deep learning models, particularly in the field of natural language processing (NLP). In NLP, mel, LF0 and G features are used to represent the spectral content of audio signals in a way that is more perceptually meaningful than the representative raw frequency values. A brief explanation of each term: Mel: The Mel-frequency scale is a nonlinear frequency scale that maps sound frequencies to a linear scale based on the human auditory system's perception of sound. LF0: LF0 stands for "low-frequency formant," which refers to the lowest frequency formant in a vowel sound. G: The term G refers to the "glottal closure" feature, which represents the position of the glottis (the opening in the larynx) during speech production. Loss: In deep learning models for NLP, loss refers to a measure of how well the model's predictions match the true labels for a given input sequence. Common losses used in NLP include cross-entropy loss and mean squared error (MSE) loss. The most important graphs are "loss/g/LF0" and "loss/g/mel". Ideally, they should converge smoothly over time. As long as there are extreme outliers, your model is not trained enough. But in the end you have to test your model to find out if it meets your quality requirements. Often it fails because of bad source material, but you will quickly find this out in early tests. I hope that answers some of the question :) |
Beta Was this translation helpful? Give feedback.
-
I suggest to preview audios in "Audio" tab, and watch "Scalars" tab with ~0.98 smoothing. |
Beta Was this translation helpful? Give feedback.
I'm no professional, either, but i've read some of the machine learning papers.
Mel, LF0, G and loss are all terms used in the context of deep learning models, particularly in the field of natural language processing (NLP). In NLP, mel, LF0 and G features are used to represent the spectral content of audio signals in a way that is more perceptually meaningful than the representative raw frequency values.
A brief explanation of each term:
Mel: The Mel-frequency scale is a nonlinear frequency scale that maps sound frequencies to a linear scale based on the human auditory system's perception of sound.
LF0: LF0 stands for "low-frequency formant," which refers to the lowest frequency formant i…