Speaker diarization

Recipe for spekaer diarization using Kaldi.

Clustering (standalone)

Check clustering/ dir. Based on pyannote-audio lib.

Notes

Dataset used in the recipe consists of 5 audios from the English portion of the CallHome dataset.
CallHome is 2-channel, μ-law 8 kHz telephone speech and it is converted on the fly via sox under the wav.scp file to mono-channel, 16 kHz PCM (see fblocal/prep_data.sh). If you're dealing with data already in PCM format then you'll need to edit the script.
No model is trained. Instead, the scripts download a pre-trained SRE16 model.

References

Speaker Diarization with Kaldi by Yoav Ramon (Towards Data Science blog)
"Speakers in the Wild" informal documentation by David Ryan Snyder
NIST SRE 2016 Xvector Recipe by David Ryan Snyder
Kaldi's callhome_diarization v2 recipe on egs/.

Grupo FalaBrasil (2020) - https://ufpafalabrasil.gitlab.io/
Universidade Federal do Pará (UFPA) - https://portal.ufpa.br/
Cassio Batista - https://cassota.gitlab.io/