Problem/error with Vietnamese datasets? #961
goctoidongtien
started this conversation in
General
Replies: 3 comments 15 replies
-
Sometimes certain models can go crazy with certain content, I would try another model and see if it makes the same errors |
Beta Was this translation helpful? Give feedback.
1 reply
-
it isn't a problem specific to vietnamese, but rather a classic hallucination problem, see #679 |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, I'm just started to use Whisper via FreeSubtitles.ai and the result has some problem.
A large part of the resulting subtitle file returned to me is not subtitle lines, but a call for viewers to subscribe to a YouTube channel. The line in the picture means "Please subscribe to the channel Ghien Mi Go so you don't miss interesting videos." Tried many different models but same result.
I assume this is an issue related to the data source imported into the model from the start. Many subtitles on Vietnamese YouTube videos are not actually subtitles. It's just dummy subtitles, with a single line calling viewers to subscribe to the youtube channel with timecode from start to finish. I think OpenAI should double check its multilingual data source, because it greatly affects the results. Hope it gets fixed soon.
Beta Was this translation helpful? Give feedback.
All reactions