This repository contains tools for benchmarking ASR systems using BIGOS corpora.
Aspect | Considerations |
---|---|
Metrics | Support for well-established ASR evaluation metrics. |
Extensibility | Straightforward integration of new datasets, normalization methods, metrics, and new ASR systems. |
Availability | Publicly accessible and intuitive presentation of results. (see Polish ASR leaderboard) |
Comprehensiveness | Performance analysis across scenarios, system params, and user groups. |
BIGOS (Benchmark Intended Grouping of Open Speech) corpora aims at simplifying the access and use of publicly available ASR speech datasets.
Currently BIGOS corpora is available for Polish language.
Two BIGOS corpora types are available at the Hugging Face platform:
- BIGOS V2 - containing mostly read speech.
- PELCRA for BIGOS - containing mostly conversational speech
BIGOS V2 and PELCRA for BIGOSc* Evaluate publicly available ASR systems - Polish ASR leaderboard .
- Evaluate community-provided ASR systems - 2024 PolEval challenge.
This section provides instructions on how to use the provided Makefile for various runtime configurations and evaluation tasks. Please ensure all necessary configurations and dependencies are set up before proceeding.
Ensure you have the following prerequisites:
- Python 3.x
- Required Python packages (install via
requirements.txt
)
You need to provide user-specific configuration e.g. Cloud API keys.
To do so edit "template.ini" and save as "config.ini" (./config/user-specific/config.ini
)
ll.
To validate if configuration is valid, run: make test-force-hyp make test
To run all evaluation steps for a specific eval_config: make eval-e2e EVAL_CONFIG=<eval_config_name>
To run the evaluation for all eval_configs: make eval-e2e-all
make hyps-stats EVAL_CONFIG=<eval_config_name>
By default, if specific intermediary results exists, the processing is skipped. To force regeneration of hypotheses, evaluation scores calculation etc, completent the command with "force" For example: To force the evaluation for all eval_configs: make eval-e2e-all-force
To force the evaluation for a specific eval_config: make eval-e2e-force EVAL_CONFIG=<eval_config_name>
To run evaluation for BIGOS V2 dataset run: make eval-e2e EVAL_CONFIG=bigos To replicate exact results, contact [email protected] to obtain copy of ASR hypotheses.
To run evaluation for PELCRA for BIGOS dataset run: make eval-e2e EVAL_CONFIG=pelcra To replicate exact results, contact [email protected] to obtain copy of ASR hypotheses.
You can run evaluation for various datasets, systems, normalization methods etc. To add new or edit existing runtime configuration go to "config/eval-scores-gen-specific" folder and add/edit relevant file.
See exemplary implementations of ASR systems classes in scripts/asr_eval_lib/asr_systems. Create new file with the implementation of new ASR system based on "template_asr_system.py" script. Add reference to the new ASR system in the "scripts/asr_eval_lib/asr_systems/init.py". Update dependecies in the requirements.txt.
Open existing config for already supported dataset e.g. "config/eval-scores-gen-specific/bigos.json Modify it and save as the new configuration as "config/eval-scores-gen-specific/<dataset_name>.json". Make sure that new dataset follows the BIGOS format and is publicly available. To run the evaluation for new dataset: make eval-e2e EVAL_CONFIG=<dataset_name>
To generate a synthetic test set: make tts-set-gen TTS_SET=<tts_set_name> Replace <tts_set_name> with the appropriate values for your use case.
To display a manifest for a specific dataset and split: make sde-manifest DATASET=<dataset_name> SPLIT=<split_name> Replace <dataset_name>, and <split_name> with the appropriate values for your use case.