Dynamic Time Warping (DTW) Speech Recognition

A speech recognition system that uses feature extraction and dynamic time warping (DTW) to identify words and to find the most similar speaker.

Features

Word Recognition: Identifies words spoken by different speakers using a weighted voting mechanism across multiple feature types.
Speaker Similarity Detection: Determines the most similar speaker to a target speaker based on feature comparisons.
Dynamic Time Warping (DTW): Measures similarity between time-series data with a focus on minimizing cost paths.
Multiple Feature Types: Supports MFCC, LPC, and Mel Spectrogram features for diverse and accurate audio analysis.
Confidence and Threshold Handling: Ensures reliable predictions by applying confidence thresholds and cost checks.

Setup

Clone the repository:

git clone https://github.com/mradovic38/dtw-speech-recognition
cd dtw-speech-recognition

Install dependencies:

pip install -r requirements.txt

Usage

Create a directory of audio files of different speakers saying the same group of words.
Name your audio files as word-speaker.wav (ex. down-mark.wav).
For details on how to run the project on your data, refer to the run.py file. It contains examples and explanations for word recognition and speaker similarity detection.

Customization

Adjust feature weights in word_recognition.py to prioritize specific features.
Modify confidence and cost thresholds to suit different datasets.

How it works

Feature Extraction:

Extracts audio features such as MFCC, LPC, and Mel spectrograms from input audio files.

Dynamic Time Warping (DTW):

Computes similarity between time-series data (input audio vs. database).

Word Recognition:

Aggregates feature-based predictions using weighted voting across all three feature types for reliable word identification.

Speaker Similarity:

Compares speakers by averaging DTW costs across shared vocabulary based on a passed feature extraction alghoritm.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dynamic Time Warping (DTW) Speech Recognition

Features

Setup

Usage

Customization

How it works

Feature Extraction:

Dynamic Time Warping (DTW):

Word Recognition:

Speaker Similarity:

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dynamic Time Warping (DTW) Speech Recognition

Features

Setup

Usage

Customization

How it works

Feature Extraction:

Dynamic Time Warping (DTW):

Word Recognition:

Speaker Similarity:

License