You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using NeuralDiarizer with the default diar_infer_telephonic.yaml settings (nemo version 1.21.0). I am using it to diarize the real-life phone call recordings.
I have experienced the same issue for almost any shorter audio file (less than 2 minutes duration) I have diarized: the first couple of utterances, pronounced by two different speakers, are merged into the same one, and labeled such that it was spoken by a single speaker.
After the initial glitch, diarizer continues to work with the very precise predictions, so this issue is really only about the very first couple of sentences.
Any recommendation how to improve its precision for that particular problem?
Steps/Code to reproduce bug
I am using NeuralDiarizer with the default diar_infer_telephonic.yaml settings file, with this addition:
meta= {
"audio_filepath": os.path.join(output_dir, "mono_file.wav"),
"offset": 0,
"duration": None,
"label": "infer",
"text": "-",
"rttm_filepath": None,
"uem_filepath": None,
}
withopen(os.path.join(data_dir, "input_manifest.json"), "w") asfp:
json.dump(meta, fp)
fp.write("\n")
pretrained_vad="vad_multilingual_marblenet"pretrained_speaker_model="titanet_large"config.num_workers=0config.diarizer.manifest_filepath=os.path.join(data_dir, "input_manifest.json")
config.diarizer.out_dir= (
output_dir# Directory to store intermediate files and prediction outputs
)
config.diarizer.speaker_embeddings.model_path=pretrained_speaker_modelconfig.diarizer.oracle_vad= (
False# compute VAD provided with model_path to vad config
)
config.diarizer.clustering.parameters.oracle_num_speakers=Falseconfig.diarizer.clustering.parameters.enhanced_count_thres=80config.diarizer.clustering.parameters.max_speaker_num=2# Here, we use our in-house pretrained NeMo VAD modelconfig.diarizer.vad.model_path=pretrained_vadconfig.diarizer.vad.parameters.onset=0.8config.diarizer.vad.parameters.offset=0.6config.diarizer.vad.parameters.pad_offset=-0.05config.diarizer.msdd_model.model_path= (
"diar_msdd_telephonic"# Telephonic speaker diarization model
)
The text was updated successfully, but these errors were encountered:
uro-sh
changed the title
NeuralDiarizer with the telephonic config mix speakers at the very beginning of audio files
NeuralDiarizer with the telephonic config mix speakers at the very beginning of shorter audio files (less than 2 minutes duration)
Oct 22, 2024
Describe the bug
I am using NeuralDiarizer with the default diar_infer_telephonic.yaml settings (nemo version 1.21.0). I am using it to diarize the real-life phone call recordings.
I have experienced the same issue for almost any shorter audio file (less than 2 minutes duration) I have diarized: the first couple of utterances, pronounced by two different speakers, are merged into the same one, and labeled such that it was spoken by a single speaker.
After the initial glitch, diarizer continues to work with the very precise predictions, so this issue is really only about the very first couple of sentences.
Any recommendation how to improve its precision for that particular problem?
Steps/Code to reproduce bug
I am using NeuralDiarizer with the default diar_infer_telephonic.yaml settings file, with this addition:
The text was updated successfully, but these errors were encountered: