Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model creates duplicate transcriptions #12442

Open
ceyxasm opened this issue Mar 3, 2025 · 4 comments
Open

Model creates duplicate transcriptions #12442

ceyxasm opened this issue Mar 3, 2025 · 4 comments
Labels
ASR bug Something isn't working

Comments

@ceyxasm
Copy link

ceyxasm commented Mar 3, 2025

import nemo.collections.asr as nemo_asr
import sys
MODEL_PATH = '/home/bubu/attention-tag/customs/asr-onprem/parakeet-tdt_ctc-110m.nemo'

asr_model = nemo_asr.models.ASRModel.restore_from(MODEL_PATH)
transcriptions = asr_model.transcribe([sys.argv[1]])

With a single audio wav file, transcriptions consist of a tuple consisting of two transcriptions which are duplicate of each other.
PS: this issue is a copy of hugging face discussion: https://huggingface.co/nvidia/parakeet-tdt_ctc-110m/discussions/2

@ceyxasm ceyxasm added the bug Something isn't working label Mar 3, 2025
@nithinraok
Copy link
Collaborator

Hi, thanks for the issue,
We updated our signature with latest release (2.2)
and we updated the card accordingly: https://huggingface.co/nvidia/parakeet-tdt_ctc-110m#how-to-use-this-model

please check and let us know.

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt_ctc-110m")
transcriptions = asr_model.transcribe(['<file_path>'])
print(transcriptions[0].text)

@nithinraok nithinraok added the ASR label Mar 10, 2025
@ceyxasm
Copy link
Author

ceyxasm commented Mar 12, 2025

Hey but still the need to do print(transcriptions[0].text) is fishy right?
And if you were to compare transcriptions[0].text==transcriptions[1].text, you would get true.
My concern was this redundancy. Not a breaking bug but a bug nonetheless

@nithinraok
Copy link
Collaborator

why do you get ranscriptions[1].text when you pass only one audio file? pls provide code to replicate your issue

@ceyxasm
Copy link
Author

ceyxasm commented Mar 17, 2025

import nemo.collections.asr as nemo_asr
import sys
import time
MODEL_PATH = '/home/bubu/attention-tag/customs/asr-onprem/parakeet-tdt_ctc-110m.nemo'

asr_model = nemo_asr.models.ASRModel.restore_from(MODEL_PATH)
wav_file_path = sys.argv[1]
st = time.time()
transcriptions = asr_model.transcribe(wav_file_path)
et = time.time() - st
flat_transcriptions = [item for sublist in transcriptions for item in sublist]

print(et, len(transcriptions))
print(transcriptions[1])
with open('trans.txt', 'w') as f:
    f.write("\n".join(flat_transcriptions))

Image

this transcription[1] is same as transcription[0]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants