Name		Name	Last commit message	Last commit date
parent directory ..
tables		tables
README.md		README.md
corpora_stats.txt		corpora_stats.txt
creole_mt_decode_eval.sh		creole_mt_decode_eval.sh
creole_mt_train.sh		creole_mt_train.sh
creole_mt_train_tokenizer.sh		creole_mt_train_tokenizer.sh
creoles_list.txt		creoles_list.txt

README.md

CreoleM2M

Pre-requisite Bible data

We do not own this data, and it his highly confidential due to copy-right, and must not be shared publically.

If you are looking for Bible Data in general, please reach out to the authors of "Creating a massively parallel Bible corpus. Again, because Bible data is copy-righted, we unfortunately cannot share it publically.

If you have specific questions about the CreoleM2M data, please reach out to the authors.

Data preperation

The Creole language codes are present in creoles_list.txt, and the dataset sizes are in corpora_stats.txt.

Our data is in the following format:

(a) train.<creole code>-eng.<creole code> and train.<creole code>-eng.eng are the training files,

(b) train.<creole code>, train.eng for the n-way parallel training segments of the aforementioned data,

(c) dev.<creole code>, dev.eng for the n-way parallel development set segments of the aforementioned data,

(d) test.<creole code>, test.eng for the n-way parallel test set segments of the aforementioned data.

All results in the paper are calculated on the test set mentioned above.

Experiments

Step 0: Acquire Bible data and create train-dev-test splits.

Step 1: Install YANMTT

You will need YANMTT to decode models we have trained. If you dont use YANMTT, you can always use huggingface transformers to fine-tune and decode models yourself.

Step 2: Train tokenizer

(see creole_mt_train_tokenizer.sh)

Step 3: Train model

(see creole_mt_train.sh)

Step 4: Decode and evaluate model

(see creole_mt_decode_eval.sh)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

creolem2m

creolem2m

README.md

CreoleM2M

Pre-requisite Bible data

Data preperation

Experiments

Step 0: Acquire Bible data and create train-dev-test splits.

Step 1: Install YANMTT

Step 2: Train tokenizer

Step 3: Train model

Step 4: Decode and evaluate model

Files

creolem2m

Directory actions

More options

Directory actions

More options

Latest commit

History

creolem2m

Folders and files

parent directory

README.md

CreoleM2M

Pre-requisite Bible data

Data preperation

Experiments

Step 0: Acquire Bible data and create train-dev-test splits.

Step 1: Install YANMTT

Step 2: Train tokenizer

Step 3: Train model

Step 4: Decode and evaluate model