Tested with Ubuntu 22
and Python 3.10
and Nvidia A10
GPU.
- Ensure you have the
machamp
submodule initialized in case you did not clone with--recursive
flag:git submodule update --init --recursive
- Setup a working Python environment and install the
requirements.txt
from the MaChAmp directory. - Download the datasets with
./download_masakhaner.sh
.
All experiments here run with MaChAmp. Configuration files were updated to work with v0.4.2 of the software.
All experiments here run with MaChAmp. Configuration files were updated to work with v0.4.2 of the software.
To get MaChAmp and install its requirements, you can run:
git submodule update --init --recursive # (if you didn't clone CreoleVal with --recursive flag)
pip install -r machamp/requirements.txt
It's recommended to do this in a virtual environment. (PS. If there are issues with the jsonnet installation, try installing it via conda-forge, i.e. conda install -c conda-forge jsonnet
).
The configs
folder contains all the MaChAmp configuration files for the model hyperparameters and datasets.
Filenames beginning with params_
contain model hyperparameters where the pretrained transformer models can be changed by changing the following line to any Huggingface transformer model:
"transformer_model": "bert-base-multilingual-cased",
- For all experiments, there are config files for mBERT, mT5, and XLM-R.
- Filenames beginning with
senti
contain the data filepaths and configurations. Ensure that filepaths mentioned in these files are same as the filepaths to the data files on your system (should be the case if you use the instructions & scripts in this repo).
Output of a train file will create a logs
dir where all your models, metrics, scores and test files will be stored.
- Data: MasakhaNER 2.0
- Download script:
download_masakhaner.sh
- Download script:
- Config:
configs/ner_naija.json
- Train:
./train.sh ner_naija {mbert,mt5,xlmr}
- Data: As we could not find the official splits, the data can be found in:
data/WikiAnn_data/<lang>
<lang>
can be one of: bi, cbk_zam, ht, pap, pih, sg, tpi
- Config:
configs/ner_wikiann_<lang>.json
- Train:
./train.sh ner_wikiann_<lang> {mbert,mt5,xlmr}