Skip to content

Latest commit

 

History

History

speech_recognition

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Introduction

Speech recognition is one of nicehe field which allows humans to interact with machines in a sophisticated lazy way. Here we are trying to understand, experiment and try to create user friendly apps on those fronts with Tensorflow models.

Related Work

Android App

Problem Statement

To come up with an simple easy to use software environment to train on audio data with plug and play modules for data pre-processing, training different models and serving the pre-trained models on web and mobile devices.

Proposed Solution

Come up with following modular components which can be then used as plug and play components:

  • Dataset Modules with preprocessing Modules
  • Data Iterator Modules
  • Tensorflow Models
  • Tensorflow Model serving
    • Web app
    • Mobile

Validation


Learning

Do checkout here for basic understanding of Audio Dataset to work with Deep Learning!

Dataset

If you want to train on your own data, you'll need to create .wavs with your recordings, all at a consistent length, and then arrange them into subfolders organized by label. For example, here's a possible file structure:

my_wavs >
  up >
    audio_0.wav
    audio_1.wav
  down >
    audio_2.wav
    audio_3.wav
  other>
    audio_4.wav
    audio_5.wav

Sample data set used here is from http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz

Libraries


#sample how to run. For more check on models!
cd /path/to/sarvam/src/

python speech_recognition/commands/run_experiments.py \
--mode=train \
--dataset-name=speech_commands_v0 \
--data-iterator-name=audio_mfcc_google \
--model-name=cnn_trad_fpool3 \
--batch-size=32 \
--num-epochs=5

tensorboard logdir=experiments/CNNTradFPool/