Vowel Classification Neural Network

Motivation

Feedforward Neural Network for speech recognition of vowels using Linear Predictive Coding (LPC) Coefficients.

For a walkthrough of this repository, checkout this video.

Shoutout to my Linguistics professor who wrote this in 1992 and is still teaching this to us at the ripe age of 70+.

High Level Overview

Task	Components	Scripts Used	Full Writeup
Vowel NN Classifier	1. Create LPC coefficients for training and ground truth data 2. Train feedforward models on the vowel classification task	Create_LPC_Data_Sets Vowel_Classification_NN	Link
Vowel Backness	Identifying Vowel Backness using LPC data	Generate_LPC_Data	Link

Tangent: Formants and LPCs

Linear Predictive Coding

As a quick side tangent, Linear Predictive Coding or LPC is a technique discovered by Bell Labs in 1989 used to quantize and compress speech signals.
LPC uses an auto-regressive model with by predicting the ith wave sample using the past N samples and learned constants

c1 * wave(i-1) + c2 * wave(i-2) + c3 * wave(i-3) + ... + cN * wave(i-N)

We refer to LPC N as the wave quantization using the LPC process using N coefficients

Sampling Rates

In this project, we use LPC 14 since we have a sample rate of 14kHz or 14,000 samples per second
In the original Bell Labs research, the researchers used the heuristic 1 LPC coefficient per 1kHz sampling rate

Neural Network Motivation

Since LPC are linguistically grounded in vocal tract constrictions (since they are related to formants), each vowel exhibits has a unique LPC/Formant "fingerprint"
Thus, our goal is to use a Feedforward Neural Network to classify which vowel is being produced in a speech signal

(TODO: More citations needed for this)

Technologies Used

Signal Processing

lpc function for generation LPC Coefficients from Waveforms
resample for changing the original 44,100 (44kHz) sampling rate to 14,000 (14kHz) sampling rate Libraries:
Matlab Signal Processing Toolbox
Librosa
- Equivalent to Matlab's Signal Processing Toolbox features the lpc and resample functions
SciPy
- Deserializes Matlab .mat files into Python objects (numpy arrays)

Neural Network Training

PyTorch for Neural Network Architecture and Training

Vowel NN Classifier

NN Classifier taking LPC coefficients as inputs and a One Hot Encoding of 10 vowels as its output

1) Dataset Building

A Matlab script resamples, truncates, and preprocesses the utterances in order to ensure the LPC coefficients are reflective of the target vowel and not random noise
For more details on the sampling process, read the writeup on section "Preprocessing Methods"

2) Vowel Classification Training

A Python script takes in the .mat files from the previous file and trains a simple feedforward network on the vowel classification task
The neural networks also have varied hidden layer sizes where increasing the number of hidden neurons seems to have increased the learning
1. For more details on the hidden neurons, read another writeup, "1 vs. 5 Hidden Neurons"

Side Tangent: Vowel Backness

Rather than taking a Neural Network approach to identifying vowel backness, I took the effect size using Cohen's Dof the front vs back vowels to attempt to identify which LPC Coefficients could identify frontness vs backness
The script for the effect size calculations can be found here: Generate_LPC_Data.ipynb

Limitations

The model architecture used for this project is quite simple and more of a proof of concept for more sophisticated speech detection tasks.

Furthermore, only 218 data samples were used. Since there are 10 output vowels, it would also seem that input output pairs on neurons with less than 4 neurons would not perform well (since models with 3 neurons would only have 2^3 = 8 possibilities while there are 10 vowels)

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
graphs		graphs
training_losses		training_losses
.gitignore		.gitignore
Create_LPC_Data_Sets.m		Create_LPC_Data_Sets.m
Generate_LPC_Data.ipynb		Generate_LPC_Data.ipynb
Vowel_Classification_NN.ipynb		Vowel_Classification_NN.ipynb
readme.md		readme.md
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vowel Classification Neural Network

Table of Contents

Motivation

High Level Overview

Tangent: Formants and LPCs

Linear Predictive Coding

Sampling Rates

Neural Network Motivation

Technologies Used

Signal Processing

Neural Network Training

Vowel NN Classifier

1) Dataset Building

2) Vowel Classification Training

Side Tangent: Vowel Backness

Limitations

About

Releases

Packages

Languages

Ky-Ng/Vowel-Detection-NN

Folders and files

Latest commit

History

Repository files navigation

Vowel Classification Neural Network

Table of Contents

Motivation

High Level Overview

Tangent: Formants and LPCs

Linear Predictive Coding

Sampling Rates

Neural Network Motivation

Technologies Used

Signal Processing

Neural Network Training

Vowel NN Classifier

1) Dataset Building

2) Vowel Classification Training

Side Tangent: Vowel Backness

Limitations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages