You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Like, some sentences are too long for subtitle files. Is there a way to limit the length of transcribed sentences or split long sentences in code? Thanks.
The text was updated successfully, but these errors were encountered:
I've been playing about with this today. The SubtitlesProcessor module included with whisperx is really good!
fromwhisperx.SubtitlesProcessorimportSubtitlesProcessor# Do all of your whisper transcribing / alignment here# Output of the alignment stage should be an object called `result`# All variable names below apart from `result` are settings that can be exposed to the user.subtitles_proccessor=SubtitlesProcessor(
result["segments"],
language_code, # str, two letter code to identify the languagemax_line_length=max_line_length, # int, around 100 has been working for memin_char_length_splitter=sub_split_threshold, # int, around 70 has been working for meis_vtt=is_vtt, # bool, true for vtt, false for srt format
)
subtitles_proccessor.save(output_path, advanced_splitting=True) # output_path is a str with your desired filename
There's an alternative example in the pull request here
Like, some sentences are too long for subtitle files. Is there a way to limit the length of transcribed sentences or split long sentences in code? Thanks.
The text was updated successfully, but these errors were encountered: