Error Loading Fine-Tuned MSDD Model - Missing `speaker_model_cfg` in NeMo #12600

AllanK24 · 2025-03-13T15:57:26Z

I fine-tuned the NeMo MSDD (Multi-Scale Diarization Decoder) model on VoxConverse 2-speaker subset using NVIDIA's official fine-tuning notebook:
🔗 Notebook Used: [Speaker_Diarization_Training.ipynb](https://github.com/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb)

After training, I attempted to load the fine-tuned MSDD model for inference using the NeuralDiarizer class in three different ways:

# Attempt 1: Load using config
msdd_model = NeuralDiarizer(cfg=my_config).to(device)

# Attempt 2: Load from .nemo file
msdd_model = NeuralDiarizer.from_pretrained("my_model.nemo", map_location=device)

# Attempt 3: Load from .ckpt file
msdd_model = NeuralDiarizer.load_from_checkpoint("my_model.ckpt", map_location=device)

❌ All three methods resulted in the following error:

[NeMo E 2025-03-13 18:25:01 nemo_logging:417] Model instantiation failed!
    Target class:       nemo.collections.asr.models.msdd_models.EncDecDiarLabelModel
    Error(s):   Key 'speaker_model_cfg' is not in struct
        full_key: speaker_model_cfg
        object_type=dict

🚨 Issue Summary:

The error suggests that speaker_model_cfg is missing from the configuration.
The model was fine-tuned with a frozen speaker embedding extractor.
The issue occurs when trying to load the model for inference, even though the fine-tuning process followed all steps correctly.
I suspect that the model's config file does not include speaker_model_cfg, but it's required during inference.

Steps/Code to Reproduce Bug

Fine-Tune MSDD Model using NVIDIA’s official [Speaker_Diarization_Training.ipynb](https://github.com/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb).
Save the trained model in .nemo format:
```
msdd_model.save_to("my_model.nemo")
```

Try to load the fine-tuned model for inference:

from nemo.collections.asr.models.msdd_models import NeuralDiarizer

msdd_model = NeuralDiarizer.from_pretrained("my_model.nemo", map_location="cuda")

Observe the error:

Key 'speaker_model_cfg' is not in struct

Expected Behavior

I expected the fine-tuned MSDD model to load correctly for inference after training without requiring speaker_model_cfg, as the speaker embedding extractor was frozen during fine-tuning.

Environment Overview

Training Environment: Google Colab
Inference Environment: Local machine
NeMo Version: 2.3.0rc0
Python Version: 3.12
Installation Method:
```
pip install nemo_toolkit['all']
```

The text was updated successfully, but these errors were encountered:

AllanK24 added the bug Something isn't working label Mar 13, 2025

AllanK24 changed the title ~~GitHub Issue Report: Error Loading Fine-Tuned MSDD Model - Missing speaker_model_cfg in NeMo~~ Error Loading Fine-Tuned MSDD Model - Missing speaker_model_cfg in NeMo Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error Loading Fine-Tuned MSDD Model - Missing `speaker_model_cfg` in NeMo #12600

Error Loading Fine-Tuned MSDD Model - Missing `speaker_model_cfg` in NeMo #12600

AllanK24 commented Mar 13, 2025 •

edited

Loading

Error Loading Fine-Tuned MSDD Model - Missing speaker_model_cfg in NeMo #12600

Error Loading Fine-Tuned MSDD Model - Missing speaker_model_cfg in NeMo #12600

Comments

AllanK24 commented Mar 13, 2025 • edited Loading

Steps/Code to Reproduce Bug

Expected Behavior

Environment Overview

Error Loading Fine-Tuned MSDD Model - Missing `speaker_model_cfg` in NeMo #12600

Error Loading Fine-Tuned MSDD Model - Missing `speaker_model_cfg` in NeMo #12600

AllanK24 commented Mar 13, 2025 •

edited

Loading