-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve LRC generation #83
Comments
Hey, thanks for the writeup! LRC was added by request of someone else, I haven't used it. As far as I know, there is no way to handle the instrumental/music aspect using the current model. As you mentioned, increasing the 'detect-language' is only for the actual detect-language webhook, it will not have any impact outside of using it in Bazarr at this point. Having a 'library' of forced-languages doesn't work in my head. Say I want it to be fr or en, but Whisper detects it as German. What's my next step? To fix the line breaks, you could change the The rest of what you are seeing are hallucinations caused by the model, and there is no way to fix them here (see: openai/whisper#928 and openai/whisper#679). They would have to be fixed upstream. If you wanted to give a hack at fixing some of the other stuff for LRC, i'd take a PR. You'd probably want to look at Line 486 in 5c96212
Line 550 in 5c96212
|
I was hoping the model would maybe offer multiple languages with varying confidence scores (e.g. I wasn't aware of the custom regroup, I'll give it a try! Would you be opposed to some kind of static regex to get rid of some of the more common hallucinations? |
I think the model will provide an array of probabilities, though I haven't messed with it. Your idea makes sense now. I'll see if there is any easy way to get that array. Yup, open to any regex you want to try to throw in. |
Do I understand correctly that the detect-language setting will only use the detected language for sending it to Bazarr, but not for the STT? Or is Bazarr actually doing the detection, and sending the result back to subgen? |
Bazarr will request detect-language if it doesn’t know the language of a file. Whisper does the detection and sends it back to Bazarr. Then Bazarr will use that to force the language on a subsequent call to generate a subtitle. Whereas the way the LRC is being made, Whisper autodetects the language and uses that for the rest of the file. There’s no easy way to get the probabilities of languages without rewriting the flow of the program. |
Okay, and is the whisper-based autodetection using the configured first 30s ( I got a bit confused by your comment about the 'detect-language'... |
At this point in time DETECT_LANGUAGE_LENGTH only works with Bazarr. I'm
looking at adding it to the rest of the flow. What you're seeing now is
the default 30 seconds Whisper uses.
…On Thu, Apr 18, 2024 at 7:27 AM Chaphasilor ***@***.***> wrote:
Okay, and is the whisper-based autodetection using the configured first
30s (DETECT_LANGUAGE_LENGTH), or another duration, or the entire file?
Because it seems like some languages *should* be easily detectable, but
have a long instrumental intro.
I got a bit confused by your comment about the 'detect-language'...
—
Reply to this email directly, view it on GitHub
<#83 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APJACQORSWK5XJMZ4ULYJDTY57C3XAVCNFSM6AAAAABGLEOKI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRTHA3DONRSGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Alright, thanks for the clarification. Being able to configure it would be very useful for lyrics. |
I'm working on it, but it may not come to fruition. How many of your files are not in the language you want? Could you not force the language 100% of the time and get your desired transcription? |
Currently there are a few issues when generating LRC files for lyrics:
Instrumental tracks are not properly detected. Here are a few examples of LRC files generated for some of my music:
Examples
Direct - So Sure ```lrc [00:14.53] This is the end of this video, I hope you enjoyed this video, If you [00:20.19] did hit that thumbs up button, it helps me to make good content for [00:26.03] you, other then that, I will see you in tomorrow's video, peace out. [00:58.57] Thanks for watching, I hope you enjoyed this video, If you did hit [00:59.97] that thumbs up button, it helps me to make good content for you, [00:59.97] other then that, I will see you in tomorrow's video, peace out. [01:13.84] Thanks for watching, I hope you enjoyed this video, If you did hit [01:28.28] that thumbs up button, it helps me to make good content for you, [01:28.28] other then that, I will see you in tomorrow's video, peace out. [01:44.45] Thanks for watching, I hope you enjoyed this video, If you did hit [01:52.73] that thumbs up button, it helps me to make good content for you, [01:52.84] other then that, I will see you in tomorrow's video, peace out. [02:28.02] Thanks for watching, I hope you enjoyed this video, If you did hit [02:29.97] that thumbs up button, it helps me to make good content for you, [02:29.97] other then that, I will see you in tomorrow's video, peace out. [02:58.56] Thanks for watching, I hope you enjoyed this video, If you did hit [02:59.97] that thumbs up button, it helps me to make good content for you, [02:59.97] other then that, I will see you in tomorrow's video, peace out. [03:16.03] Thanks for watching, I hope you enjoyed this video, If you did hit [03:22.87] that thumbs up button, it helps me to make good content for you, [03:23.02] other then that, I will see you in tomorrow's video, peace out. [03:55.96] Thanks for watching, I hope you enjoyed this video, If you did hit [03:57.43] that thumbs up button, it helps me to make good content for you, [03:57.43] other then that, I will see you in tomorrow's video, peace out. [04:01.87] Thanks for watching, I hope you enjoyed this video, If you did hit [04:03.50] that thumbs up button, it helps me to make good content for you, [04:03.50] other then that, I will see you in tomorrow's video, peace out. ```Direct - Opal
ENV - Brave
Droptek - Science
Falcon Funk - Catnip Trip (Perkulat0r Remix)
Falcon Funk & Bossfight
Intercom - Decoy World (feat. Park Avenue) (notice the "Thank you for watching" at the end, that is pretty common)
Hosini - Flyga
Inova - Grime
Inova - Enraged
Airmov - PRESENCE
Some non-instrumental tracks are also not properly detected:
Examples
Direct & Matt Van - I Don't Mind
Sometimes a randomly-formatted "Music" section (or another random section) is created:
Examples
Lyrics lines often contain line breaks, which aren't properly detected by LRC parsers (since each line should be one lyric line with a time stamp, and the generated files are essentially a mix of synchronized and unsynchronized lyrics:
Examples
AWAY & Midoca & Dark Waves - Too Close
Airmov & Trove - Make Me Break
Not sure how much of these issues are under your control or could be manually fixed, or if you're even willing to improve the LRC generation. But I wanted to discuss these issues anyway :)
All of this was tested using the default settings, aside from setting up the Jellyfin connection and a transcribe folder. So maybe using another model is a better solution? Although I don't think all issues would be solved by that.
The text was updated successfully, but these errors were encountered: