Whisper is recognising Malayalam and outputing Tamil script (Bug) #1019
skillhacker-code
started this conversation in
Ideas
Replies: 1 comment 2 replies
-
@skillhacker-code Whisper is trained only on negligible (0.5 hours) Malayalam audio which causes the issue with transcription, as also shown in the benchmarking results in the whisper paper. It is possible to achieve good results by fine-tuning the whisper model on Malayalam training data. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Whisper is recognizing Malayalam Language but transcribing as tamil language
if anyone can look into it ,it would be awesome .below is a link to a malayalam video
[https://www.youtube.com/watch?v=uuMiwU5w_kE&pp=ygUZc2FudGhvc2ggZ2VvcmdlIGt1bGFuZ2FyYQ%3D%3D](Malayalam video)
malayalam look like - സന്തോഷവും സമാധാനവും
Tamil look like - அமைதி மற்றும் மகிழ்ச்சி
It would be great if whisper team can solve this issue .
Beta Was this translation helpful? Give feedback.
All reactions