You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
VoxLingua107 is a speech dataset for training spoken language identification models. The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives. VoxLingua107 contains data for 107 languages, including Indonesian, Javanese, and Sundanese.
License
CC-BY 4.0
The text was updated successfully, but these errors were encountered:
NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?voxlingua
The text was updated successfully, but these errors were encountered: