Streamlined annotation pipeline VERSION 1 by Riya Anand
To Note:
- This code is modeled and referenced from Whisper (for transcript) + Charsiu (for alignment) functions.
- Current Whisper model used in code --> 'base'. For more accurate annotation -- change to 'large' in transcribe.py
- Code works best for file lengths under around 20 minutes.
The Whisper portion is a streamlined version of OpenAI's free Whisper speech recognition model and is based on an original notebook by @amrrs, with added documentation and test files by Pete Warden.
The Charsiu portion is based off Charsiu's alignment model and their provided Colab Notebooks. Forced alignment functions were utilized as it takes in the transcript generated from Whisper's model.
PULLING REPO INSTRUCTIONS
-
Pull from the main branch as well as from the dropdown "development" branch to ensure proper cloning of the Charsiu code. Alternatively, the following command can be run in the terminal:
git clone https://github.com/lingjzhu/charsiu cd charsiu
USER INSTRUCTIONS
-
Upload .wav files into an "input" folder of your choice
-
Create an output directory for your files. For an input, "a.wav" the code will produce various outputs. If "transcribe.py" is run, the code will output "a.txt" (the raw transcript of the audio file through Whisper), "a.csv" and "a.TextGrid" which are the basic annotations. If the user chooses to run "features.py" the code will also output "a_with_features.csv" which outputs the phonetic feature annotation. Please note that features.py requires an input csv file which transcribe.py generates.
-
to run transcribe.py, use the following command
python transcribe.py --input-dir ./input --output-dir ./output (for all .wav files in a directory)
-
run features.py to get more extensive phonetic features if necessary (** note: input is the prev transcribed csv file)
python features.py --input-dir ./input --output-dir ./output
ex: python features.py output/migrationaudiop3.csv output/migrationaudiop3_with_features.csv
Below: to recreate virtual environment:
python -m venv .venv
source .venv/bin/activate # (or .venv\Scripts\activate on Windows)
pip install -r requirements.txt
Includes code from Charsiu (MIT License). Copyright (c) 2021 Jian Zhu.
Modifications (c) 2025 Riya Anand.
Original Project Link: https://github.com/lingjzhu/charsiu
Portions adapted from openai-whisper-webapp (MIT License). Copyright (c) 2022 amrrs.
Modifications (c) 2025 Riya Anand.
Original Project Link: https://github.com/amrrs/openai-whisper-webapp/tree/main