Feedforward Neural Network for speech recognition of vowels using Linear Predictive Coding (LPC) Coefficients.
For a walkthrough of this repository, checkout this video.
Shoutout to my Linguistics professor who wrote this in 1992 and is still teaching this to us at the ripe age of 70+.
Task | Components | Scripts Used | Full Writeup |
---|---|---|---|
Vowel NN Classifier | 1. Create LPC coefficients for training and ground truth data 2. Train feedforward models on the vowel classification task |
Create_LPC_Data_Sets Vowel_Classification_NN |
Link |
Vowel Backness | Identifying Vowel Backness using LPC data | Generate_LPC_Data | Link |
- As a quick side tangent, Linear Predictive Coding or LPC is a technique discovered by Bell Labs in 1989 used to quantize and compress speech signals.
- LPC uses an auto-regressive model with by predicting the
i
th wave sample using the pastN
samples and learned constants
c1 * wave(i-1) + c2 * wave(i-2) + c3 * wave(i-3) + ... + cN * wave(i-N)
- We refer to
LPC N
as the wave quantization using the LPC process usingN
coefficients
- In this project, we use
LPC 14
since we have a sample rate of14kHz
or14,000
samples per second - In the original Bell Labs research, the researchers used the heuristic 1 LPC coefficient per
1kHz
sampling rate
- Since LPC are linguistically grounded in vocal tract constrictions (since they are related to formants), each vowel exhibits has a unique LPC/Formant "fingerprint"
- Thus, our goal is to use a Feedforward Neural Network to classify which vowel is being produced in a speech signal
(TODO: More citations needed for this)
lpc
function for generation LPC Coefficients from Waveformsresample
for changing the original44,100
(44kHz
) sampling rate to14,000
(14kHz
) sampling rate Libraries:- Matlab Signal Processing Toolbox
- Librosa
- Equivalent to Matlab's Signal Processing Toolbox features the
lpc
andresample
functions
- Equivalent to Matlab's Signal Processing Toolbox features the
- SciPy
- Deserializes Matlab
.mat
files into Python objects (numpy
arrays)
- Deserializes Matlab
- PyTorch for Neural Network Architecture and Training
- NN Classifier taking LPC coefficients as inputs and a One Hot Encoding of 10 vowels as its output
- A Matlab script
resamples
,truncates
, and preprocesses the utterances in order to ensure the LPC coefficients are reflective of the target vowel and not random noise - For more details on the sampling process, read the writeup on section "Preprocessing Methods"
- A Python script takes in the
.mat
files from the previous file and trains a simple feedforward network on the vowel classification task - The neural networks also have varied hidden layer sizes where increasing the number of hidden neurons seems to have increased the learning
- For more details on the hidden neurons, read another writeup, "1 vs. 5 Hidden Neurons"
- Rather than taking a Neural Network approach to identifying vowel backness, I took the effect size using Cohen's Dof the front vs back vowels to attempt to identify which LPC Coefficients could identify
frontness
vsbackness
- The script for the effect size calculations can be found here: Generate_LPC_Data.ipynb
The model architecture used for this project is quite simple and more of a proof of concept for more sophisticated speech detection tasks.
Furthermore, only 218 data samples were used. Since there are 10 output vowels, it would also seem that input output pairs on neurons with less than 4 neurons would not perform well (since models with 3 neurons would only have 2^3 = 8
possibilities while there are 10 vowels)