Incorrect number of utterances for the 10min and 1h subsets #53

mzboito · 2023-04-21T07:28:58Z

Hello,

I recently downloaded this dataset, and noticed that the 10min and 1h subsets are of equal size (in number of utterances).
Both account to 1,571 lines of phonetic transcriptions.

Fetching the corresponding audios results in two sets that are 05:29:37 long (HH:MM:SS).
I'm guessing this is a mistake? :)

azinonos · 2024-02-26T14:39:01Z

I have the same issue. Moreover, loading the instances in Python I get the exact same filelists in both files, so the files are identical.

EDIT:
It seems like the 1h folder is split into 6 sub-folders, 10 mins each. So by taking all paths of any of those sub-folders you would have the 10mins of data, and by taking the entire subfolder you would have the 1h. So you could rebuild the .txt files using the established directory structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect number of utterances for the 10min and 1h subsets #53

Incorrect number of utterances for the 10min and 1h subsets #53

mzboito commented Apr 21, 2023

azinonos commented Feb 26, 2024 •

edited

Loading

Incorrect number of utterances for the 10min and 1h subsets #53

Incorrect number of utterances for the 10min and 1h subsets #53

Comments

mzboito commented Apr 21, 2023

azinonos commented Feb 26, 2024 • edited Loading

azinonos commented Feb 26, 2024 •

edited

Loading