author: steeve LAQUITAINE
date: 28/08/2021
Short description: Voice activity detection is critical to reduce the computational cost of continuously monitoring large volume of speech data necessary to swiftly detect command utterances such as wakewords. My objective was to code a Voice Activity Detector (VAD) with reasonable performances (Low false rejection rate) based on a neural network within a week and with low computing resources. I trained and tested the model on labelled audio data containing speech from diverse speakers including male, female, synthetic, low and low volume, slow and fast space speech properties. The dataset came from LibriSpeech and was prepared and provided by SONOS. I used a variety of tools to extract, preprocess and develop and test the model but I mostly relied on Tensorflow advanced Subclassing api, tensorflow ops and Keras, Tensorboard, Seaborn and the more classical matplotlib visualization tools to make sense of the data, clean the data and inspect the inner workings of the model.
notebooks/report.pdf
notebooks/report.ipynb
Prerequisites installations :
You have:
conda 4.8.3
(which conda
in a terminal).- you have
Git 2.32.0
You can get and run the codebase in 3 steps:
- Setup:
git clone https://github.com/slq0/vad_deepnet.git
cd vad_deepnet
conda create -n vad python==3.6.13
conda activate vad
pip install kedro==0.17.4
bash setup.sh
-
Move the dataset to
vad_deepnet/data/01_raw/
-
Run basic model training (takes 30min) and predict-eval (20 secs):
kedro run --pipeline train --env train
kedro run --pipeline predict_and_eval --env predict_and_eval
-
Development:
VSCODE
: coding in an integrated development environmentConda
: isolate environment and manage dependenciesGit
: code versioningGithub
: centralize repo for collaborationKedro
: standardize codebase
-
Experiment tracking & reproducibility:
mlflow
: pipeline parameters & model experiment trackingTensorboard
: model experiment inspection & optimizationgit-graph
: keep track of flow of commit and branches
-
Readability:
black
: codebase formattingpylint
: codebase linting
-
Test coverage:
pytest
: minimal unit-tests
Create conda environment, install python and kedro for codebase standardization.
conda create -n vad python==3.6.13 kedro==0.17.4
- I used a light version of the
Gitflow Workflow
methodology for code versioning and collaboration. - A
Master
branch will be ourproduction
branch (final deployment): - I created and moved to a
Develop
branch and branched out afeature
branch to start developing- The
Develop
branch would hypothetically be anintegration
branch (for continuous integration and testing)
- The
- I kept track of my commits and the workflow of branches with
git-graph
git clone https://github.com/slq0/vad_deepnet.git
Run this bash script to build and install the project's dependencies:
bash setup.sh
Train the basic model:
Run the training pipeline:
kedro run --pipeline train --env train
Run inference with the model:
kedro run --pipeline predict --env predict
Evaluate its performance metrics:
kedro run --pipeline predict_and_eval --env predict_and_eval
Visualize layers' weights, biases across epochs, training and validation loss, performance metrics on validation, the model's conceptual and structural graph to dive into decreasing levels of abstraction.
The model runs are logged in tbruns/
.
tensorboard --logdir tbruns
# http://localhost:6006/
I used mlflow to track experiments and tested hyperparameter runs (e.g., run duration).
The logs are stored in mlruns/
.
kedro mlflow ui --env train --host 127.0.0.1 --port 6007
# http://localhost:6007/
To keep track of the pipeline and optimize it, I used Kedro-viz which described the pipelines with Directed Acyclic graphs:
kedro viz
# http://127.0.0.1:4141
Run unit-tests on the code base. I initialized unit-tests but did not have to implement more than one test. You can run unit-tests with:
pytest src/tests/test_run.py
You can open the package's Sphynx documentation by opening docs/build/html/index.html
in your web browser (double click on the file):
kedro build-docs --open
We can use the pure speech and noise corpora below for speech vs. silence classes. We can also augment pure speech dataset by adding noisy speech data created by summing speech and noise data.
TIMIT
corpus for clean speech (1)- license: ([TODO]: check)
QUT- NOISE
: corpus of noise (1)- license: CC-BY-SA ([TODO]: check)
The final report is notebook/report.pdf
with a collapsible table of content
(see in preview for mac and adobe reader on windows)
To format the .ipynb report into a .pdf run in the terminal :
jupyter nbconvert notebooks/report.ipynb --to pdf --no-input
note: references are formatted according to the Amercian Psychological Association (APA) style
(1) Dean, D., Sridharan, S., Vogt, R., & Mason, M. (2010). The QUT-NOISE-TIMIT corpus for evaluation of voice activity detection algorithms. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (pp. 3110-3113). International Speech Communication Association.