This repository constains preprocessed audion mfcc and transcripts. The dataset is separated into train, dev and test sets.
The Dataset statistics
Train | 979 |
Dev | 122 |
Test | 123 |
Folder structure
\train:
\mfcc:
\transcript:
\dev:
\mfcc:
\transcript:
\test:
\mfcc:
\transcript:
The Dataset statistics
Total clips: | 1,224 |
Total Words: | 17,559 |
Total characters: | 116,439 |
Total Duration: | 03:11:13 |
Min clip length: | 1 sec |
max clip length: | 59 sec |
Unique words: | 5,040 |
Dataset Source: https://data.mendeley.com/datasets/hnvkvj589y/1
Girma, Birhanu Shimelis; Senbatu, Dereje Hinsermu (2022), “Afaan Oromoo Text-to-Speech Dataset”, Mendeley Data, V1, doi: 10.17632/hnvkvj589y.1