SpeechLM Update #12430

stevehuang52 · 2025-02-28T20:17:19Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

A couple of important updates in SpeechLM

Collection: [speechlm]

Changelog

Unified various input format into multimodal conversation, where audio and text are interleaved. This is achieved by mapping single turn Cut samples into NeMoMultimodalConversation. Later should also convert text SFT data to NeMoMultimodalConversation in nemo/collections/speechlm/data/dataset/audio_text_lhotse_dataset.py:MultimodalConversationDataset as well.
Added Whisper encoder support.
Fixed issues with PEFT not saving batch norm stats.
Fixed loading PEFT ckpt for inference.
Context parallel support in LLM. Speech encoder still cannot.
Various improvements and refactoring.

Signed-off-by: stevehuang52 <[email protected]>

add type hint Signed-off-by: He Huang (Steve) <[email protected]>

Signed-off-by: stevehuang52 <[email protected]>

Signed-off-by: He Huang (Steve) <[email protected]>

Signed-off-by: stevehuang52 <[email protected]>

Signed-off-by: He Huang (Steve) <[email protected]>

Signed-off-by: stevehuang52 <[email protected]>

Signed-off-by: artbataev <[email protected]>

Signed-off-by: stevehuang52 <[email protected]>

nemo/collections/speechlm/data/dataset/audio_text_lhotse_dataset.py

nemo/collections/speechlm/modules/asr_module.py

nemo/collections/speechlm/utils/text_generation/audio_text_generation_utils.py

Signed-off-by: stevehuang52 <[email protected]>

stevehuang52 · 2025-03-03T21:36:37Z

nemo/collections/common/data/lhotse/cutset.py

@@ -330,7 +330,7 @@ def parse_and_combine_datasets(
        if (w := item.get("weight")) is not None:
            weights.append(w)

-    assert all(t == tarred_status[0] for t in tarred_status), "Mixing tarred and non-tarred datasets is not supported."
+    # assert all(t == tarred_status[0] for t in tarred_status), "Mixing tarred and non-tarred datasets is not supported."


This is commented out to allow mixing tarred and untarred datasets, then use force_map_dataset or force_iterable_dataset later to control whether they're wrapped into map or iterable dataset. @pzelasko Do we have a better solution now?

stevehuang52 and others added 30 commits September 16, 2024 10:38

fix type bugs

d967c51

Signed-off-by: stevehuang52 <[email protected]>

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

84eaa59

Merge remote-tracking branch 'origin/main' into slm_v2

22217d0

Merge remote-tracking branch 'origin/main' into slm_v2

799a5ec

Update mixin.py

568f073

add type hint Signed-off-by: He Huang (Steve) <[email protected]>

Apply isort and black reformatting

f91030d

Signed-off-by: stevehuang52 <[email protected]>

Update mixin.py

c9256ca

Signed-off-by: He Huang (Steve) <[email protected]>

Apply isort and black reformatting

e41e4f9

Signed-off-by: stevehuang52 <[email protected]>

Update mixin.py

2a70bd3

Signed-off-by: He Huang (Steve) <[email protected]>

Merge branch 'main' into slm_v2

725fd88

add datamodule

de70200

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into slm_v2

98a2673

add speechlm peft train, continue train, validation and misc

a00e5dd

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_nemo2.0

52a1b0e

resolve merge confict

9d57bc6

Signed-off-by: stevehuang52 <[email protected]>

update datamodule

ae3c5be

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_nemo2.0

c2ef987

Signed-off-by: stevehuang52 <[email protected]>

add script

dc951e2

Signed-off-by: stevehuang52 <[email protected]>

fix speechlm inference

248f4e2

Signed-off-by: stevehuang52 <[email protected]>

update

c42e17e

Signed-off-by: stevehuang52 <[email protected]>

mergin origin/main

9ec5a96

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_nemo2.0

90dcb6a

fix tp support

0a93523

Signed-off-by: stevehuang52 <[email protected]>

Apply isort and black reformatting

5c87470

Signed-off-by: stevehuang52 <[email protected]>

Apply isort and black reformatting

934b0db

Signed-off-by: artbataev <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_nemo2.0

e7af9a7

Merge remote-tracking branch 'origin/main' into heh/speechlm_nemo2.0

0c32fa5

update

9d98e89

Signed-off-by: stevehuang52 <[email protected]>

refactor

b659912

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_nemo2.0

3653320

stevehuang52 added 16 commits February 18, 2025 14:17

fix

144e39f

Signed-off-by: stevehuang52 <[email protected]>

fix import ckpt

c0040d3

Signed-off-by: stevehuang52 <[email protected]>

update io

c6791b0

Signed-off-by: stevehuang52 <[email protected]>

fix hf tokenizer remove_special_tokens

74a04be

Signed-off-by: stevehuang52 <[email protected]>

refactor

957e729

Signed-off-by: stevehuang52 <[email protected]>

comment out lhotse assert

dd725e7

Signed-off-by: stevehuang52 <[email protected]>

update cfg

04c6e77

Signed-off-by: stevehuang52 <[email protected]>

update cfg

902b0f6

Signed-off-by: stevehuang52 <[email protected]>

refactor and update inference

d05cbd4

Signed-off-by: stevehuang52 <[email protected]>

update infer

b2a853f

Signed-off-by: stevehuang52 <[email protected]>

update

c0da97f

Signed-off-by: stevehuang52 <[email protected]>

update cfg

f2f288a

Signed-off-by: stevehuang52 <[email protected]>

fix peft trainable params and update

fd6de0d

Signed-off-by: stevehuang52 <[email protected]>

add support for whisper encoder

26bb5a4

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_dev

601364e

Signed-off-by: stevehuang52 <[email protected]>

update

b88f0c0

Signed-off-by: stevehuang52 <[email protected]>

github-actions bot added common Multi Modal labels Feb 28, 2025

stevehuang52 added the skip-linting label Feb 28, 2025

stevehuang52 self-assigned this Feb 28, 2025

github-advanced-security bot found potential problems Feb 28, 2025

View reviewed changes

nemo/collections/speechlm/data/dataset/audio_text_lhotse_dataset.py Fixed Show fixed Hide fixed

nemo/collections/speechlm/modules/asr_module.py Fixed Show fixed Hide fixed

nemo/collections/speechlm/utils/text_generation/audio_text_generation_utils.py Fixed Show fixed Hide fixed

stevehuang52 added 4 commits March 3, 2025 15:49

clean up

d7eca13

Signed-off-by: stevehuang52 <[email protected]>

clean up

64e027e

Signed-off-by: stevehuang52 <[email protected]>

Merge remote-tracking branch 'origin/main' into heh/speechlm_dev

a66d99b

clean up

0188d8d

Signed-off-by: stevehuang52 <[email protected]>

stevehuang52 requested review from yaoyu-33 and pzelasko March 3, 2025 20:59

stevehuang52 marked this pull request as ready for review March 3, 2025 20:59

stevehuang52 requested a review from huvunvidia March 3, 2025 21:03

stevehuang52 commented Mar 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpeechLM Update #12430

SpeechLM Update #12430

stevehuang52 commented Feb 28, 2025 •

edited

Loading

stevehuang52 Mar 3, 2025

SpeechLM Update #12430

Are you sure you want to change the base?

SpeechLM Update #12430

Conversation

stevehuang52 commented Feb 28, 2025 • edited Loading

What does this PR do ?

Changelog

stevehuang52 Mar 3, 2025

Choose a reason for hiding this comment

stevehuang52 commented Feb 28, 2025 •

edited

Loading