- Clone the XTREME Repo
git clone https://github.com/google-research/xtreme.git
- Install the XTREME tools
This step is really only needed if you are running experiements on XTREME, ie to evaluate your models. We will be just looking through the XTREME data to better understand its layout so we do not need most of the tools installed here. But its best to follow the steps as outlined in the XTREME repo so it is worth running this script.
You can refer to the XTREME repo for more details on these steps.
cd xtreme
bash install_tools.sh
- Install dependencies
XTREME has a few dependencies you will need to use their datasets.
Check out their repo for the full list.
I just needed to install the transofrmers library but you may need some more
pip install transformers
- Manually download Panx
There is one dataset you need to manually download
You then need to manually downloadpanx_dataset
(for NER) manually so
- Create a download folder with
mkdir -p download
in the root of this project - Manually download the dataset from here.
This will download as AmazonPhotos.zip and make sure this zip file is in thedownload
directory within the XTREME repo.
- Download the remaining datasets
And finally, run this in the root of the project to download the remaining datasets.
bash scripts/download_data.sh
If you have any issues you should try and download the individual dataseset and see what the issue is.
In the download script you can see the tasks called at the end:
download_xnli
download_pawsx
download_tatoeba
download_bucc18
download_squad
download_xquad
download_mlqa
download_tydiqa
download_udpos
download_panx
So try working through these one by one and see where the issues is.