Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speech_llm/modular_audio_gpt_train.py is not running while freeze_audio_encoder: False #12627

Open
yuvalasherq opened this issue Mar 16, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@yuvalasherq
Copy link

yuvalasherq commented Mar 16, 2025

Describe the bug

Running speech_llm/modular_audio_gpt_train.py with freeze_audio_encoder: False results in a runtime error during trainer.fit:

RuntimeError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

This occurs because the optimizer state restoration assumes a fixed parameter group structure, which changes when the audio encoder is not frozen.

Steps/Code to reproduce bug

Clone NeMo repo (main branch).
/path/to/NeMo/examples/multimodal/speech_llm/modular_audio_gpt_train.py model.freeze_audio_encoder=False model.freeze_llm=True model.freeze_modality_adapter=False model.global_batch_size=4 model.micro_batch_size=2 model.pretrained_audio_model=/path/to/stt_en_fastconformer_transducer_large.nemo model.restore_from_path=/path/to/megatron_gpt_345m.nemo trainer.val_check_interval=1

Error occurs immediately during trainer.fit(model) after checkpoint loading or initialization.

Expected behavior

Training should proceed without requiring manual modifications to Lightning’s internals. Optimizer state restoration should be depended on the configuration.

Environment overview (please complete the following information)

  • Method of NeMo install: git clone
  • No docker

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

  • OS version: Ubuntu 22.04.1
  • PyTorch version: 2.5.1
  • Python version: 3.10.12

Workaround

File: .../site-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py
Line 297:
# self.restore_optimizers_and_lr_schedulers(...)

After this modification, I was able to successfully overfit a single audio sample, confirming training works with freeze_audio_encoder=False.

@yuvalasherq yuvalasherq added the bug Something isn't working label Mar 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant