Griko Italian Parallel Corpus

This repository contain a (very) small parallel speech corpus between the endangered language Griko and Italian. It is made of 330 sentences, with the following information levels: speech, machine extracted pseudo-phones, transcriptions, translations and sentence alignment. A reference for evaluation following Track 2 of the Zero Resource Challenge 2017 is available here in two formats, with and without silence marks information.

The dataset is made available to the community for reproducible computational language documentation experiments and their evaluation.

Reference: "A small Griko-Italian speech translation corpus", Marcely ZANON BOITO, Antonios ANASTASOPOULOS, Marika LEKAKOU, Aline VILLAVICENCIO, Laurent BESACIER, SLTU 2018, Gurgaon, India.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Griko Italian Parallel Corpus

Files

README.md

Latest commit

History

README.md

File metadata and controls

Griko Italian Parallel Corpus