-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mismatch diarization result between pyannote/speaker-diarization-3.0 and k2-fsa/speaker-diarization #1708
Comments
Additionally, I conducted a comparison of the embedding models using cosine similarity. The similarity score was nearly 1, indicating that the embeddings generated by both models were almost orthogonal. from pyannote.audio import Model
from pyannote.audio import Inference
import sherpa_onnx
from scipy.spatial.distance import cdist
model = Model.from_pretrained("pyannote/wespeaker-voxceleb-resnet34-LM")
inference = Inference(model, window="whole")
audio_fp = "change_to_your_audio_filepath"
embedding_pyannote = inference(audio_fp)
config = sherpa_onnx.SpeakerEmbeddingExtractorConfig(model = "/Users/kridtaphadsae-khow/.cache/huggingface/hub/models--csukuangfj--speaker-embedding-models/snapshots/0743f301363dec56491a490f6d6cbc9d67f9a3bf/wespeaker_en_voxceleb_resnet34_LM.onnx", num_threads = 1, debug=True, provider = "cpu")
extractor = sherpa_onnx.SpeakerEmbeddingExtractor(config)
audio, sample_rate = read_wave(audio_fp)
stream = extractor.create_stream()
stream.accept_waveform(sample_rate=sample_rate, waveform=audio)
embedding_sherpa = np.asarray(extractor.compute(stream))
distance = cdist(np.expand_dims(embedding_pyannote, axis=0), np.expand_dims(embedding_sherpa, axis=0), metric="cosine")
print(distance) >> array([[0.82130009]]) |
If it is nearly 0, then you can consider them almost orthogonal. If it is nearly 1, then you cannot say they are almost orthogonal. |
Can you share ck-interview-mono.wav ? |
|
In the context of the scipy implementation, 1 indicates orthogonality, while 0 signifies parallelism. |
I see what you mean now
|
Please show the complete code. what is |
I used the |
I attempted to diarize the audio clip using the same model, but I obtained different results. Is this a known issue related to the ONNX format, or did I make a mistake in my process?
I have checked the pipeline of the
pyannote/speaker-diarization-3.0
and select the same model as provided insherpa-onnx
How to reproduce
pyannote/speaker-diarization-3.0
Output
k2-fsa/speaker-diarization
Ran on https://huggingface.co/spaces/k2-fsa/speaker-diarization
wespeaker_en_voxceleb_resnet34_LM.onnx|26MB
pyannote/segmentation-3.0
2
Output
The text was updated successfully, but these errors were encountered: