Skip to content

Latest commit

 

History

History
8 lines (4 loc) · 851 Bytes

README.md

File metadata and controls

8 lines (4 loc) · 851 Bytes

Griko Italian Parallel Corpus

This repository contain a (very) small parallel speech corpus between the endangered language Griko and Italian. It is made of 330 sentences, with the following information levels: speech, machine extracted pseudo-phones, transcriptions, translations and sentence alignment. A reference for evaluation following Track 2 of the Zero Resource Challenge 2017 is available here in two formats, with and without silence marks information.

The dataset is made available to the community for reproducible computational language documentation experiments and their evaluation.

  • Reference: "A small Griko-Italian speech translation corpus", Marcely ZANON BOITO, Antonios ANASTASOPOULOS, Marika LEKAKOU, Aline VILLAVICENCIO, Laurent BESACIER, SLTU 2018, Gurgaon, India.