Created for the Creole Suite by Marcell Fekete. Accuracy scores are out of a 100.
Tested with Ubuntu 22
and Python 3.10
.
- Create a Python virtual environment, either
venv
orconda
; - Install the necessary dependencies with
pip install -r requirements.txt
orpython3 -m pip install -r requirements.txt
if usingconda
. - Additionally install PyTorch with your preferred configuration https://pytorch.org/
- Activate your Python environment;
- Run
bash ./run_experiments.sh
fromtatoeba_task/
as your working directory. - Results are stored in the
experiments/
folder.
The script will download the necessary data and run the experiments for the models bert-base-multilingual-cased
, xlm-roberta-base
, google/mt5-base
, and random
(for random baseline). By default python3
is used to run the Python code, change this as needed.
- Run the
plot_distributions.py
script with the input folderdata/
and output folder./plots/length
as arguments. This will plot sentence lengths per language pair. - Run the
create_tables.py
script with the input folderexperiments/
as an argument. analysis.py
: calculates tokenizer fertility and token overlap between source and target sentences and plots them with a default output in./plots/analysis/
./data/
folder: contains test samples in the format of tsv files./experiments/
folder: contains experiment outputs per language model (and random baseline)./plots/length/
folder: contains barplots plotting the length distribution of the test samples./plots/analysis
folder: contains plots of tokenizer fertility and token overlap between source and target sentences per language pair per language model./tables/
folder: contains the aggregated results of the experiments with accuracy and average cosine similarity scores per language./tatoeba/
folder: if thedownload_tatoeba.sh
orrun_experiments.sh
script is run, it contains the data from the Tatoeba Challenge repository arranged in folders per language
create_tables.py
: aggregates experimental results with a default output in./tables/
download_tatoeba.sh
: downloads data from the Tatoeba Challenge repository and arranges it in the right format for further processing with a default output in./tatoeba/
prepare_data.py
: creates test samples conforming with the Tatoeba task with a default output in./data/
run_experiments.sh
: downloads and prepares the data from the Tatoeba Challenge repository and carries out the sentence pair retrieval task, by default forbert-base-multilingual-cased
,xlm-roberta-base
,google/mt5-base
andrandom
(for random baseline) with outputs in./experiments/
It does this by calling the other scriptsrun_tatoeba.py
: carries out the sentence pair retrieval task with a default output in./experiments/