Skip to content

Latest commit

 

History

History
118 lines (80 loc) · 4.44 KB

README.md

File metadata and controls

118 lines (80 loc) · 4.44 KB

Eval

You will find here all relevant evaluation launched on the LibriLight-dataset.

ABX

ABX is an evaluation metric for unsupervised representation learning. It evaluates feature files based on its ability to distinguish sounds like /i/ and /e/ as in "bit" versus "bet".

Setup

To setup the ABX evaluation script you need to:

  1. compile the cython code. Just do:
cd ABX_src
python setup.py build_ext --inplace
  1. Check that everything works properly with:
cd ABX_src
nosetests -d
  1. Download the Librilight .item files here: ABX_data.tgz.

This archive contains four .item files constructed from the Librispeech dev and test set: dev-clean.item, dev-other.item, test-clean.item, and test-other.item, which provide the labels for the ABX evaluation.

How to run the ABX evaluation ?

Dump your features in .pt (torch), .npz or .npy (numpy) format somewhere. Your features dataset should look like this:

\data_dir
  file_name_0.extension
  file_name_1.extension
  ...

Each file should contain a 2D-vector of shape Sequence_size x Feature_dimension.

Then run:

python eval_ABX.py $PATH_FEATURE_DIR  $PATH_TO_ABX_ITEMS/$DB_NAME.item --file_extension $EXTENSION --out $OUTPUT_DIR --feature_size $FEATURE_SIZE

Where $DB_NAME is one of the 4 evaluation datasets (dev-clean, dev-other, test-clean, test-other) and $FEATURE_SIZE is the duration (in s) of one feature of the model (for a 10ms frame rate, this would be 0.01).

Pre-computed checkpoints

Some pre-computed model trained with CPC are available for use ! In order to load a model just use CPC_loader.py, for example to retrieve the model trained on the 60k hours dataset:

python CPC_loader.py 60k $PATH_OUTPUT_CHECKPOINT

You can directly evaluate the ABX score on this checkpoint by running:

python eval_ABX.py $PATH_AUDIO_DIR  ABX_data/$DB_NAME.item --file_extension $EXTENSION --out $OUTPUT_DIR --path_checkpoint $PATH_OUTPUT_CHECKPOINT

Where $EXTENSION corresponds to an audio foramt (.wav, .flac ...)

Linear Classification PER

Representations can also be evaluated by how easy it is to train a linear phoneme classifier.

Setup

To setup the PER evaluation script you need to compile the cython code it relies on. Just do:

cd PER_src
python setup.py build_ext --inplace

You will also need to download the 10h labelled data.

How to run the PER evaluation ?

First you need to train a linear classifier on your features. For example, if you want to evaluate a model fine-tuned on the 10h dataset, just run:

python eval_PER.py train $PATH_TO_10h_AUDIO_DATA_DIR $PATH_TO_10h_PHONE_DATA $PATH_TO_THE_JSON_PHONE_CONVERTER $PATH_TO_THE_CPC_MODEL -o $PATH_OUT

Then you can run the PER computation, for example on librispeech100/test-clean:

python eval_PER.py per $PATH_OUT/checkpoint.pt $PATH_TO_TEST_CLEAN $PATH_TO_TEST_CLEAN_PHONES --file_extension .flac

WER

We provide here a test of representations based on word error rate.

Setup

  • wav2letter python bindings: (how-to).
  • KenLM-based Librispeech language model, can be found here or downloaded here; it should be placed into WER_data/.
  • lexicon, download; it should be placed into WER_data/.
  • jiwer, installable via pip install jiwer.

How to run the WER evaluation?

Training a letter classifier on top of a pre-trained CPC model:

python eval_WER.py --path_train=$PATH_FINETUNING --path_val=$PATH_TO_DEV_CLEAN --path_checkpoint=$PATH_OUT/checkpoint.pt --lr=1e-3  --n_epochs=50 --p_dropout=0.1 --output=$OUTPUT_DIR

Evaluating it with wav2letter decoder:

python eval_WER.py --path_checkpoint=$PATH_OUT/checkpoint.pt --lr=1e-3  --n_epochs=50 --p_dropout=0.1 --output=$OUTPUT_DIR --path_wer=$PATH_TO_TEST_CLEAN

You can also train and evaluate afterwards, in a single command:

python eval_WER.py --path_train=$PATH_FINETUNING --path_val=$PATH_TO_DEV_CLEAN --path_checkpoint=$PATH_OUT/checkpoint.pt --lr=1e-3  --n_epochs=50 --p_dropout=0.1 --output=$OUTPUT_DIR --path_wer=$PATH_TO_TEST_CLEAN