-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the information of your dataset in detail? #8
Comments
Hi, I'll write here my todo list to update and edit repo as soon as possible: TODO
Thanks to @mohsenoon for analysing data |
Update 1 : 3/6/2023 new Stats for data is ready, these stats are not 100% accurate but they are accurate enough. you can trust these numbers : Total Hours : 1697.1423399942473 Hour |
Hi Masoud, |
@hamjam then if my information is wrong, send a pull request for reamdefile and correct it. thanks for your attention |
Apart from data size, please publish other information such as duration, number of files, average length of files, quality, number of speakers, source, and method of collection.
Also, since these data are Google's speech-to-text transcriptions, it is better to report this issue and its approximate error.
The raw outputs of a speech-to-text model can be used with some considerations to train other models, but it certainly cannot be introduced as a speech-to-text dataset.
The text was updated successfully, but these errors were encountered: