Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Loading Fine-Tuned MSDD Model - Missing speaker_model_cfg in NeMo #12600

Open
AllanK24 opened this issue Mar 13, 2025 · 0 comments
Open
Labels
bug Something isn't working

Comments

@AllanK24
Copy link

AllanK24 commented Mar 13, 2025

I fine-tuned the NeMo MSDD (Multi-Scale Diarization Decoder) model on VoxConverse 2-speaker subset using NVIDIA's official fine-tuning notebook:
🔗 Notebook Used: [Speaker_Diarization_Training.ipynb](https://github.com/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb)

After training, I attempted to load the fine-tuned MSDD model for inference using the NeuralDiarizer class in three different ways:

# Attempt 1: Load using config
msdd_model = NeuralDiarizer(cfg=my_config).to(device)

# Attempt 2: Load from .nemo file
msdd_model = NeuralDiarizer.from_pretrained("my_model.nemo", map_location=device)

# Attempt 3: Load from .ckpt file
msdd_model = NeuralDiarizer.load_from_checkpoint("my_model.ckpt", map_location=device)

All three methods resulted in the following error:

[NeMo E 2025-03-13 18:25:01 nemo_logging:417] Model instantiation failed!
    Target class:       nemo.collections.asr.models.msdd_models.EncDecDiarLabelModel
    Error(s):   Key 'speaker_model_cfg' is not in struct
        full_key: speaker_model_cfg
        object_type=dict

🚨 Issue Summary:

  • The error suggests that speaker_model_cfg is missing from the configuration.
  • The model was fine-tuned with a frozen speaker embedding extractor.
  • The issue occurs when trying to load the model for inference, even though the fine-tuning process followed all steps correctly.
  • I suspect that the model's config file does not include speaker_model_cfg, but it's required during inference.

Steps/Code to Reproduce Bug

  1. Fine-Tune MSDD Model using NVIDIA’s official [Speaker_Diarization_Training.ipynb](https://github.com/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Diarization_Training.ipynb).
  2. Save the trained model in .nemo format:
    msdd_model.save_to("my_model.nemo")
  3. Try to load the fine-tuned model for inference:
    from nemo.collections.asr.models.msdd_models import NeuralDiarizer
    
    msdd_model = NeuralDiarizer.from_pretrained("my_model.nemo", map_location="cuda")
  4. Observe the error:
    Key 'speaker_model_cfg' is not in struct
    

Expected Behavior

I expected the fine-tuned MSDD model to load correctly for inference after training without requiring speaker_model_cfg, as the speaker embedding extractor was frozen during fine-tuning.


Environment Overview

  • Training Environment: Google Colab
  • Inference Environment: Local machine
  • NeMo Version: 2.3.0rc0
  • Python Version: 3.12
  • Installation Method:
    pip install nemo_toolkit['all']
@AllanK24 AllanK24 added the bug Something isn't working label Mar 13, 2025
@AllanK24 AllanK24 changed the title GitHub Issue Report: Error Loading Fine-Tuned MSDD Model - Missing speaker_model_cfg in NeMo Error Loading Fine-Tuned MSDD Model - Missing speaker_model_cfg in NeMo Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant