Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot translate subtitles that contain romaji words #12

Open
Kartatz opened this issue May 15, 2024 · 2 comments
Open

Cannot translate subtitles that contain romaji words #12

Kartatz opened this issue May 15, 2024 · 2 comments

Comments

@Kartatz
Copy link

Kartatz commented May 15, 2024

I have been using this tool for a long time, and it works pretty well most of the time. However, I have been struggling with an issue where it fails to translate subtitles containing romaji words:

$ translatesubs './0.vtt' --to_lang 'portuguese' '1.vtt'
Translating to "portuguese".
Trying separator " $$$ "...
original length=406, translated length=369
Trying separator " ### "...
original length=406, translated length=354
Trying separator " ∞ "...
original length=406, translated length=358
Trying separator "@@@"...
original length=406, translated length=363
Trying separator " ™ "...
original length=406, translated length=352
Trying separator " @@@ "...
original length=406, translated length=359
Trying separator "$$$"...
original length=406, translated length=383
Trying separator "€€€"...
original length=406, translated length=257
Trying separator "££"...
original length=406, translated length=323
Trying separator " ## "...
original length=406, translated length=353
Trying separator "@@"...
original length=406, translated length=373
Trying separator "$$"...
original length=406, translated length=374
It seems like all tries to translate got corrupted. Try to manually set the separator using --separator argument to be DIFFERENT from: " $$$ ", " ### ", " ∞ ", "@@@", " ™ ", " @@@ ", "$$$", "€€€", "££", " ## ", "@@", "$$". Check --help menu for more information.

I tried setting --separator to some random symbols, but no one seem to work.

The subtitles contain a lot of lines like this:

00:00:00.550 --> 00:00:04.240
Tokihanatsu yo

00:00:07.170 --> 00:00:12.480
Unmei no <i>rhapsody</i>

00:00:12.480 --> 00:00:18.130
<i>You are my destiny</i>

00:00:19.440 --> 00:00:23.060
Yosou mo dekinai

00:00:23.060 --> 00:00:28.020
Kurai <i>cry</i> yami no naka de samayotteru

Editting the subtitles file and removing all the romaji words fixes the issue:

$ translatesubs './0.vtt' --to_lang 'portuguese' '1.vtt'
Translating to "portuguese".
Trying separator " $$$ "...
original length=386, translated length=385
Trying separator " ### "...
original length=386, translated length=385
Trying separator " ∞ "...
Finished!

0.vtt

@nacho00112
Copy link

same problem

@nacho00112
Copy link

nacho00112 commented May 28, 2024

Fixed it, you need to try your luck with the separators, this one worked for me: ' ²²²²²² '
I was translating from english to spanish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants