This repository contain a (very) small parallel speech corpus between the endangered language Griko and Italian. It is made of 330 sentences, with the following information levels: speech, machine extracted pseudo-phones, transcriptions, translations and sentence alignment. A reference for evaluation following Track 2 of the Zero Resource Challenge 2017 is available here in two formats, with and without silence marks information.
The dataset is made available to the community for reproducible computational language documentation experiments and their evaluation.
- Reference: "A small Griko-Italian speech translation corpus", Marcely ZANON BOITO, Antonios ANASTASOPOULOS, Marika LEKAKOU, Aline VILLAVICENCIO, Laurent BESACIER, SLTU 2018, Gurgaon, India.