What is the information of your dataset in detail? #8

mohsenoon · 2023-02-16T11:49:52Z

Apart from data size, please publish other information such as duration, number of files, average length of files, quality, number of speakers, source, and method of collection.

Also, since these data are Google's speech-to-text transcriptions, it is better to report this issue and its approximate error.
The raw outputs of a speech-to-text model can be used with some considerations to train other models, but it certainly cannot be introduced as a speech-to-text dataset.

masoudMZB · 2023-02-19T14:31:27Z

masoudMZB · 2023-03-06T11:23:12Z

Update 1 : 3/6/2023

new Stats for data is ready, these stats are not 100% accurate but they are accurate enough. you can trust these numbers :

Total Hours : 1697.1423399942473 Hour
Total size : 195510797567.33728 bytes
duration mean : 4.834937608942991 second
size : 154718.00348617567 bytes

hamjam · 2023-06-10T08:17:03Z

Hi Masoud,
I have downloaded all parts of version 2, but after removing duplicated metadata from CSVs, the remaining dataset consists of only 625h of audio clips, not 1697h. What do you think is the problem?

masoudMZB · 2023-06-20T08:29:46Z

@hamjam
hi, sorry for the late response, can you add the duration of data v1 too, and say how much is data when both versions are added?

then if my information is wrong, send a pull request for reamdefile and correct it.

thanks for your attention

mohsenoon changed the title ~~What is the information of this speech data in detail?~~ What is the information of your dataset in detail? Feb 16, 2023

masoudMZB pinned this issue Feb 19, 2023

masoudMZB added documentation Improvements or additions to documentation good first issue Good for newcomers labels Feb 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the information of your dataset in detail? #8

What is the information of your dataset in detail? #8

mohsenoon commented Feb 16, 2023 •

edited

Loading

masoudMZB commented Feb 19, 2023 •

edited

Loading

masoudMZB commented Mar 6, 2023

hamjam commented Jun 10, 2023

masoudMZB commented Jun 20, 2023

What is the information of your dataset in detail? #8

What is the information of your dataset in detail? #8

Comments

mohsenoon commented Feb 16, 2023 • edited Loading

masoudMZB commented Feb 19, 2023 • edited Loading

TODO

masoudMZB commented Mar 6, 2023

hamjam commented Jun 10, 2023

masoudMZB commented Jun 20, 2023

mohsenoon commented Feb 16, 2023 •

edited

Loading

masoudMZB commented Feb 19, 2023 •

edited

Loading