This program transforms .wav files into a .json files whose contain words with certain character selected (filter) in a specific language.
This, using the whisper-timestamped
library that processes the files and applies this filter (one or more characters). The program
detects the words whose contain these characters and creates a .json file with the words detected, including timestamps attributes for each word.
- Python 3.9 or newer
- Clone this repo (Or download it as a zip):
- Install
pip3 install git+
- Install
:- On Ubuntu or Debian:
sudo apt update && sudo apt install ffmpeg
- On Arch Linux:
sudo pacman -S ffmpeg
- On MacOS using Homebrew (
brew install ffmpeg
- on Windows using Chocolatey (
choco install ffmpeg
- on Windows using Scoop (
scoop install ffmpeg
- Install ONNX Runtime:
pip3 install onnxruntime torchaudio
- Audio backend torchaudio:
- SoundFile for Windows
pip install soundfile
- Sox for Linux/MacOs
pip install sox
- moviepy
pip install moviepy
- pympi-ling
pip install pympi-ling
Move all files to process to the input folder.
The .mp4 files will be automatically transformed into .wav files. To avoid the conversion, use the flag --use_wav True
Open the console from the cloned repository. You can use the cd
cd ./path/Marcador-Elan
Open the repository in the terminal using
cd ./{path}/Elan-Marker
then the following command line will execute the program and mark on the timeline the words that contain the letters 's' and 'd'.
python ./ --filters s d
: List of strings to filter (use lowercase)
python ./ --filters s d asa
: Folder with the input files
python ./ --input_folder mp4_folder
: Folder for output files
python ./ --output_folder elan_folder
: save temporal files
python ./ --save_temp
: Skip .wav to .mp4 conversion
python ./ --use_wav
: Select whisper model
python ./ --name_model medium
: Select language of the audio (--help to see list) (default: Spanish)
python ./ --language en
The generated files will be in output folder
- whisper-timestamped: Multilingual Automatic Speech Recognition with word-level timestamps and confidence (License AGPL-3.0).
- whisper: Whisper speech recognition (License MIT).
- dtw-python: Dynamic Time Warping (License GPL v3).
- json-to-elan: Tools and scripts for working with ELAN (License Apache-2.0).
Lucas Mesías | Joaquín Salidivia | Nicolás Aguilera
If you incorporate this in your research, reference the repository as the source.
author = {Mesías, Lucas and Saldivia, Joaquín and Aguilera, Nicolás},
month = {6},
title = {Marcador-elan},
url = {},
year = {2023}
author={Louradour, J{\'e}r{\^o}me},
journal={GitHub repository},
howpublished = {\url{}}
OpenAI Whisper paper:
title={Robust speech recognition via large-scale weak supervision},
author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
journal={arXiv preprint arXiv:2212.04356},
title={Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package},
author={Giorgino, Toni},
journal={Journal of Statistical Software},