Speaker verification result #46

pierfale · 2023-07-25T11:31:58Z

Hello,

Thank you for your work on WavLM.
I try to reproduce the results but I have some difficulties.

First of all, I don't undestand exactly the difference between scores displayed in different places. For instance, on Vox1-O:

In WavLM paper (https://arxiv.org/pdf/2110.13900.pdf) the EER is 0.383%.
On the README of this repository (https://github.com/microsoft/UniSpeech#speaker-verification) the EER is 0.33%.
On the README of the downstream tasks (https://github.com/microsoft/UniSpeech/tree/main/downstreams/speaker_verification) the EER is 0.431%.

Moreover I tried to reproduce result from the fine-tuned checkpoint available on this repository (https://drive.google.com/file/d/1-aE1NfzpRCLxA4GUxX9ITI3F9LlbtEGP/view?usp=sharing).

I get the following result on vox1-O:

Without normalisation, I get EER = 0.558%
With s-norm, I get EER = 0.542%
with as-norm (cohort size = 600), I get EER = 0.505%

Do you have any more details to provide?

Thank you

gozsoy · 2024-02-13T16:34:35Z

I can confirm that I obtained EER 0.558% for Vox1-O using WavLM large finetuned.

gancx · 2024-04-25T07:44:34Z

Hello,

Thank you for your work on WavLM. I try to reproduce the results but I have some difficulties.

First of all, I don't undestand exactly the difference between scores displayed in different places. For instance, on Vox1-O:

In WavLM paper (https://arxiv.org/pdf/2110.13900.pdf) the EER is 0.383%.

On the README of this repository (https://github.com/microsoft/UniSpeech#speaker-verification) the EER is 0.33%.

On the README of the downstream tasks (https://github.com/microsoft/UniSpeech/tree/main/downstreams/speaker_verification) the EER is 0.431%.

Moreover I tried to reproduce result from the fine-tuned checkpoint available on this repository (https://drive.google.com/file/d/1-aE1NfzpRCLxA4GUxX9ITI3F9LlbtEGP/view?usp=sharing).

I get the following result on vox1-O:

Without normalisation, I get EER = 0.558%

With s-norm, I get EER = 0.542%

with as-norm (cohort size = 600), I get EER = 0.505%

Do you have any more details to provide?

Thank you

I also observed these differences. Have you fixed it?

RegulusBai · 2024-05-23T19:34:05Z

Same 0.558% and waiting for reply

tcourat · 2024-09-16T15:47:02Z

I have the same question.

I did not test myself, but according to the original WavLM paper :

In the evaluation stage, the whole utterance is fed into the
system to extract speaker embedding. We use cosine similarity
to score the evaluation trial list. We also use the adaptive snorm [59], [60] to normalize the trial scores. The imposter
cohort is estimated from the VoxCeleb2 dev set by speakerwise averaging all the extracted speaker embeddings. We set
the imposter cohort size to 600 in our experiment. To further
push the performance, we also introduce the quality-aware
score calibration [58] for our best systems, where we randomly
generate 30k trials based on the VoxCeleb2 test set to train
the calibration model.

Maybe the results are reported by using their calibration model, but this calibration model was not shared. WIthout this quality aware score calibration, the EER on Vox1-O goes down from 0.383% to 0.617% , which may explain the gap.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speaker verification result #46

Speaker verification result #46

pierfale commented Jul 25, 2023 •

edited

Loading

gozsoy commented Feb 13, 2024

gancx commented Apr 25, 2024

RegulusBai commented May 23, 2024

tcourat commented Sep 16, 2024 •

edited

Loading

Speaker verification result #46

Speaker verification result #46

Comments

pierfale commented Jul 25, 2023 • edited Loading

gozsoy commented Feb 13, 2024

gancx commented Apr 25, 2024

RegulusBai commented May 23, 2024

tcourat commented Sep 16, 2024 • edited Loading

pierfale commented Jul 25, 2023 •

edited

Loading

tcourat commented Sep 16, 2024 •

edited

Loading