speech_llm/modular_audio_gpt_train.py is not running while freeze_audio_encoder: False #12627

yuvalasherq · 2025-03-16T14:21:27Z

Describe the bug

Running speech_llm/modular_audio_gpt_train.py with freeze_audio_encoder: False results in a runtime error during trainer.fit:

RuntimeError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

This occurs because the optimizer state restoration assumes a fixed parameter group structure, which changes when the audio encoder is not frozen.

Steps/Code to reproduce bug

Clone NeMo repo (main branch).
/path/to/NeMo/examples/multimodal/speech_llm/modular_audio_gpt_train.py model.freeze_audio_encoder=False model.freeze_llm=True model.freeze_modality_adapter=False model.global_batch_size=4 model.micro_batch_size=2 model.pretrained_audio_model=/path/to/stt_en_fastconformer_transducer_large.nemo model.restore_from_path=/path/to/megatron_gpt_345m.nemo trainer.val_check_interval=1

Error occurs immediately during trainer.fit(model) after checkpoint loading or initialization.

Expected behavior

Training should proceed without requiring manual modifications to Lightning’s internals. Optimizer state restoration should be depended on the configuration.

Environment overview (please complete the following information)

Method of NeMo install: git clone
No docker

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

OS version: Ubuntu 22.04.1
PyTorch version: 2.5.1
Python version: 3.10.12

Workaround

File: .../site-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py
Line 297:
# self.restore_optimizers_and_lr_schedulers(...)

After this modification, I was able to successfully overfit a single audio sample, confirming training works with freeze_audio_encoder=False.

The text was updated successfully, but these errors were encountered:

yuvalasherq added the bug Something isn't working label Mar 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speech_llm/modular_audio_gpt_train.py is not running while freeze_audio_encoder: False #12627

speech_llm/modular_audio_gpt_train.py is not running while freeze_audio_encoder: False #12627

yuvalasherq commented Mar 16, 2025 •

edited

Loading

speech_llm/modular_audio_gpt_train.py is not running while freeze_audio_encoder: False #12627

speech_llm/modular_audio_gpt_train.py is not running while freeze_audio_encoder: False #12627

Comments

yuvalasherq commented Mar 16, 2025 • edited Loading

yuvalasherq commented Mar 16, 2025 •

edited

Loading